Citation Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 181

I L L I N O I S

UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

PRODUCTION NOTE

University of Illinois at

Urbana-Champaign Library

Large-scale Digitization Project, 2007.

Library Trends

VOLUME 30 NUMBER 1

SUMMER 1981

University of Illinois

Graduate School of Library and Information Science

This Page Intentionally Left Blank

Bi bliometrics

WILLIAM GRAY POTTER


Issue Editor

CONTENTS
Charles H. Davis

3 FOREWORD

William Gray Potter

Daniel 0. OConnor
Henry Voos

9 EMPIRICAL LAWS, THEORY CONSTRUCTION AND BIBLIOMETRICS

William Gray Potter


M. Carl Drott

Ronald E. Wyllys

INTRODUCTION

21 LOTKAS LAW REVISITED


41

BRADFORDS LAW: THEORY,


EMPIRICISM AND T H E GAPS
BETWEEN

53 EMPIRICAL AND THEORETICAL


BASES OF ZIPFS LAW

John J. Hubert

65

Linda C. Smith

83 CITATION ANALYSIS

D. Kaye Gapen
Sigrid P. Milner

GENERAL BIBLIOMETRIC MODELS

107 OBSOLESCENCE

Jean Tague
Jamshid Beheshti
Lorna Rees-Potter

125 T H E LAW OF EXPONENTIAL


GROWTH: EVIDENCE, IMPLICATIONS AND FORECASTS

Alvin M. Schrader

151 TEACHING BIBLIOMETRICS

This Page Intentionally Left Blank

Foreword

MOREMATHEMATICAL THAN MOST, this issue of Library Trends takes a


fresh look at bibliometrics. In choosing both the issue editor and the
contributors, we have deliberately selected individuals who can provide
a new perspective. Additionally, a mix has been sought so that both
theory and potential practical applications would be addressed. Our
purpose is to stimulate greater interest in bibliometrics, while also
making the subject more accessible to a wider audience, including
students. Future issues will continue to address topics in information
science as well as traditional aspects of librarianship. On the chance that
instructors in library and information science might find this particular
issue of value in the classroom, we have extended the print run and
would like to take this opportunity to invite special orders from
educators.
CHARLES
H. DAVIS
Editor

SUMMER

1981

This Page Intentionally Left Blank

Introduction
WILLIAM GRAY POTTER

BIBLIOMETRICS
IS, simply put, the study and measurement of the publication patterns of all forms of written communication and their
authors. Though the word is of recent coinage, the practice goes back at
least to the 1920s.
There has been a great increase in the number of publications in
bibliometrics over the past two decades. This increase has not been
accompanied by critical analyses of the field and of the direction of
bibliometrics in general. The purpose of this issue of Library Trends is
to provide analyses of the major concepts of bibliometrics and to indicate its present and future directions. An effort has been made to make
the articles in this issue understandable to persons new to the topic
without depriving those readers already initiated into the mysteries of
bibliometrics of new insights and a measure of controversy. The authors
of these articles are knowledgeable in their topics, but, with a few
exceptions, are not usually associated with bibliometrics. These authors
were chosen to bring some new names and, it is hoped, new ideas to the
literature.
In a general introduction to bibliometrics, Daniel OConnor and
Henry Voos argue that because bibliometrics has largely been used only
to describe bibliographic phenomena, and is not yet able to explain or
predict these phenomena, i t is merely a method, not a theory. They state
that if bibliometrics is to attain the status ofa theory, to beable to predict
and explain, and, thus, to become more useful, researchers must concentrate on the causal factors underlying bibliographic phenomena.
William Gray Potter is Acquisitions Librarian, University of Illinois at lirbanaChampaign.
SUMMER

1981

WILLIAM POTTER

The next four articles deal with the three major laws of
bibliometrics-Lotkas law, Bradfords law, and Zipfs law-and with
attempts to unify these individual laws under one general distribution.
William Potter provides a bibliographic history of Lotkas law and its
application. M. Carl Drott examines Bradfordslaw and concludes that
more work is needed in exploring the underlying causes behind Bradfords observations. Ronald E. Wyllys provides a discussion of the
origins of Zipfs law, with some interesting observations on the character and context of Zipf himself. John J. Hubert examines efforts to join
the laws of Lotka, Bradford and Zipf into one unified, general model.
While he finds these attempts statistically sound, Hubert faults them for
being too simple, usually with only one dependent variable, and points
to research that attempts to account for more variables and which may
provide more accurate, predictive and useful models.
Citation analysis is perhaps the most written-about topic in bibliometrics. Linda C. Smith provides an extensive review of the literature
and discusses the practical applications of citation analysis.
The rate at which literature becomes obsolete is of interest to both
the information scientist studying the evolution of disciplines and to
practicing librarians concerned with collection management. D. Kaye
Gapen and Sigrid P. Milner have prepared a detailed review of research
in obsolescence.
There has been exponential growth in the number of publications
and it is widely believed that knowledge is also growing, though not at
the same rate as publications. Jean Tague, Jamshid Beheshti and Lorna
Rees-Potter discuss the relationship between the growth of literature
and the growth of knowledge.
Throughout the articles in this issue, there is a recurring theme
which, in essence, says that the traditional bibliometric models and
distributions are too simple to reflect reality accurately. To be useful,
bibliometrics must be able to explain and predict phenomena, not just
to describe them. To do this, more complex models are needed. The
problem is that bibliometrics is already thought too difficult and out of
the reach of most librarians and information scientists. One possible
solution is to incorporate bibliometrics into library and information
science curricula. Alvin M. Schrader discusses how a course on bibliometrics might be taught and provides a sample syllabus.
In addition to the contributors, I would like to credit the following
people for their contributions to this issue: Charles Davis for his encouragement and guidance; Michael Gorman, Bernard Hurley, Rebecca
Lenzini, Daniel OConnor, and Charlene Renner for their editorial

LIBRARY TRENDS

Introduction

advice and assistance;Wendy Darre and Lisa Olson for their willingness
to type and retype seemingly endless tables and bibliographies; and,
finally, to the editorial staff of Library Trends for their usual excellent
job.

References
1. Pritchard, Alan. Statistical Bibliography or Bibliometrics? Journal of
Documentation 24 (Dec. 1969):348-49.
2. Hulme, E. Wyndham. Stafzstzcal Bibliography in Relation to the Growth of
Modern Ctvrlization. London: 1923.

SUMMER

1981

This Page Intentionally Left Blank

Empirical Laws, Theory Construction


and Bibliometrics
DANIEL 0. OCONNOR
HENRY VOOS

BIBLIOMETRICS
HAS COMMANDED the attention of numerous individuals
in library and information science. The measurement of bibliographic
information offers the promise of providing a theory that will resolve
many practical problems. It is claimed that patterns of author productivity, literature growth rates and related statistical distributions can be
used to evaluate authors, assess disciplines and manage collections. Yet,
it is unclear if bibliometrics is merely a method or if it meets the test of a
theory in its ability to explain and predict phenomena. This paper
examines the properties of bibliometric distributions in a nontechnical
manner.
Twelve years ago, Pritchard coined the term bzblzornetrzcs and
defined i t as the application of mathematics and statistical methods to
books and other media of communication. Its purpose was:
1. To shed light on the processes of written communication and of
the nature and course of development of a discipline (in so far as this is
displayed through written communication), by means of counting
and analyzing the various facets of written communication ...;
2. The assembling and interpretation of statistics relating to books
and periodicals ...to demonstrate historical movements, to determine
the national or universal research of books and journals, and to
ascertain in many local situations the general use of books and
journals2

Daniel 0.OConnor is Assistant Professor, and Henry Voos is Professor, GraduateSchool


of Library and Information Studies, Rutgers University. New Brunswick, New Jersey.
SUMMER

1981

DANIEL OCONNOR

&

HENRY

voos

Both of these purposes emphasize that bibliometrics is primarily a


method. The scope of bibliometrics includes studying the relationship
within a literature (e.g., citation studies) or describing a l i t e r a t ~ r e . ~
Typically, these descriptions focus on consistent patterns involving
authors, monographs, journals, or SubjectAanguage. The literature of
bibliometrics is growing rapidly and a recent bibliography lists 2032
e n t r i e ~while
,~
another announced bibliography has 600 entries covering the years 1874 through 1959.5
Two concerns have occupied much of the bibliometric literature:
an emphasis on mathematical or statistical methods, and a search for
theoretical propositions. Fairthorne, Price and Bookstein have stated
that there is great consistency among the various bibliometric distributions6 The Bradford, Lotka and Zipf distributions are considered the
basic laws of bibliometrics, and each of these distributions was empirically derived. The distributions are similar to each other as special cases
of a hyperbolic distribution. Fairthorne summarized the similarities of
the bibliometric distributions in 1969: Almost all of them, whatever
their starting-point, end with some kind of hyperbolic distribution in
which the product of fixed powers of the variables is constant. In its
simplest discrete manifestation an input increasing geometrically produces a yield increasing arithmetically.
Thus, the similarities of the Lotka, Bradford and Zipf distributions
are not surprising. These distributions are based on rank-order frequencies (or rank-size relations) where objects are classified and then ranked.
Zipf found that rank times size equals a constant. As derived in a more
general form by Mandelbrot, frequency of occurrence is a function of
constants applied to size and rank.g Similar distributions emerge in
describing the following phenomena: rivers, populations of cities, biological genera, books (ranked by number of pages), author productivity,
citations to journals, and frequency of words.
Relationship Between Empirical Laws and Theories
The occurrence of dissimilar events at constant rates may allow for
prediction of the frequency of events, but i t does not explain their
causes.11 There is no reason to assume that the ability to make empirical
predictions will eventually lead to theoretical explanations. This philosophical issue has been dealt with by Camap:

...theoretical laws cannot be arrived at simply by taking theempirical


laws, then generalizing a few steps further. How does a physicist
arrive at an empirical law? He observes certain events in nature, He
10

LIBRARY TRENDS

Laws, Theory and Bibliometrics


notices a certain regularity. He describes this regularity by making an
inductive generalization. It might be supposed that hecould now put
together a group of empirical laws, observe some sort of pattern, make
a wider inductive generalization, and arrive at a theoretical law. Such
is not the case.12

Carnap further states that generalization from observations will never


produce a theory; instead, a theory arises not as a generalization of facts
but as a hypothesis. 13 Fairthorne addressed this problem in bibliometrics: I have surveyed the hyperbolic laws as a whole, with bibliometric
applications as particular cases. This unifies the formal aspects of this
type of behavior, and collects tools for dealing with it, without invoking
any hypothesis about the proximate causes of such beha~ior.~
Price has proposed a general bibliometric theory based on a hyperbolic curve, which he has named the Cumulative Advantage Distribution.15 In speculating on the reasons for this distribution, Price makes a
valuable contribution to concept formation and theory construction in
bibliometrics. However, his Cumulative Advantage Distribution would
be subject to Rapoports criticism of similar rank-size laws:
Clearly, if objects can be arranged according to size, beginning with
the largest, some monotonically decreasing curve will describe the
data. The fact that many of these curves are fairly well approximated
by hyperbolas proves nothing, since an infinitely large number of
curves resemble hyperbolas sufficiently closely to be identified as
hyperbolas. N o theoretical conclusion can be drawn from the fact that
many J curves look alike. Theoretical conclusions can be drawn only
if a rationale can be proposed that implies that the curves must belong
to a certain class. The content of the rationales becomes, then, the
content-bound theory.I6

As Rapoport later points out, it is the classificatory procedure that is


important along with the prior expectations of the classifier. Hill
identifies three sources of uncertainty in such statistical laws: First, the
probabilistic mechanism by which the population frequencies ...are
determined; secondly, the method of sampling from the population;
thirdly, the way in which the sample isclassified.Thus, i t isdoubtful
that the similarities of the various bibliometric distributions have great
theoretical importance.
None of this denies the practical utility of applying bibliometric
distributions to library problems, but it does bring into question two
concerns: (1) the generality of bibliometric techniques, and (2) the
likelihood that the bibliometric patterns will change over time.
Although t i n e has denied many of the practical claims attributed to
bibliometric~,~
Broadus has applied citation analyses to collection
SUMMER

1981

11

DANIEL OCQNNOR

&

HENRY

voos

building.20Other applications to collection management can be found


in a special bibliometrics issue of Collection Management edited by
The widespread application of practical bibliometric
methods-useful to library managers-will continue to be limited until
a more general, unified theory is developed. Such a theory should allow
for the possibility of change in bibliometric distributions. Hill stated
that: Zipfs law for city sizes has held until very recently, but the
development of suburbia seems to have altered matters to a certain
extent. A more sophisticated model ...would deal with the dynamics of
the situation, and not merely the one-dimensional view obtained at a
given point in time.n A similar limitation could apply to the longterm stability of bibliometric distributions, and this might account for
the minor differences in the distributions associated with various
disciplines.
Another limitation of bibliometric distributions is the use of unidimensional descriptions of consistency in author productivity or journal citation patterns. The more popular, library-related areas of
bibliometrics-Lotka and Bradford-are based on plotting one or two
variables which are then reduced to a singledimension. Such descriptive
analyses usually lack explanatory power, since there are not enough
variables to posit that one event causally influences the outcome of
another event. If bibliometric distributions have identifiable causes,
then multidimensional analyses may provide more fruitful avenues of
research than plotting new hyperbolic distributions. This multidimensional issue has serious implications for the sustained relevance of
bibliometric distributions as aids to library derision-making. This does
not deny the immediate usefulness of some of these distributions, but it
does bring into question their explanatory power and their ability to
generate new theoretical hypotheses. Twoof these distributions-Lotka
and Bradford-will now be examined in more detail.

T h e Lotka and Bradford Distributions


The Lotka distribution is based on an inverse square law where the
number of authors writing n papers is l/n2 of the number of authors
writing one paper. Each subject area can have associated with it an
exponent representing its specific rate of author productivity.23But this
does not explain why one individual produces dozens of published
papers on a subject, another individual produces several papers, and a
third individual produces none. The variability of author productivity
could be partly explained by each individuals background (e.g.. schools

12

LIBRARY TRENDS

Laws, Theory and Bibliometrics

attended, influence of mentors), current information environment (e.g.,


access to current publications, colleagues, libraries), and other characteristics. The individuals affiliation with a particular discipline could
establish different expectation levels for author productivity. For example, it is estimated that scientists produce an average of 3.8 articles per
year, while those in the social sciences produce only an average of 0.5
articles per year.25
It could be proposed that author productivity isa function of many
causes, and these might be grouped into two major conceptual areas:
(1) an authors personal characteristics (e.g., intelligence, achievement,
personality, expectations); and (2) the authors environment or situation (e.g., colleagues, availability of information, the problem under
investigation, authors field or discipline). In addition, the interactions
among personal characteristics and environmental characteristics
would create a third conceptual area for future study.26 Numerous
variables could be developed from these three conceptual areas while
recognizing that the point of this is to recast author productivity as
something that is more than a univariate statistical distribution. Author
productivity can be viewed as having a multitude of preconditions
which cause authors to behave in differentways. It is assumed that the
variability in these causes is systematically related to the variability in
productivity. In the building of causal models, it is essential that concepts are logically related in the bibliometric theory. Necessary and
sufficient preconditions need to be stated to ensure that causes and not
consequences are identified. For example, is author productivity a
function of field affiliation, or is it the other way around?
It is also important to determine how author productivity might be
changed by internal motivations, outside influences or manipulation. It
might be assumed that tenure and promotion requirements for college
and university faculty influence the degree to which individuals produce manuscripts for publication. It would be interesting to investigate
the influence of such requirements on author produeivity. Such a study
is but one method to inject the dynamics of change into the multivariate
model discussed earlier. Another test of this hypothesis would be to
compare publication patterns of academic librarians who have faculty
status (and might be expected to publish) with those who do not have
faculty status. Even at the descriptive level, this could havean influence
on the exponent associated with the Lotka distribution. External factors
could also influence publication patterns of authors. Again, librarianship could be used in the investigation of this hypothesis. Many new
library journals and new library publishers of monographs were formed
SUMMER

1981

13

DANIEL OCONNOR

8c

HENRY

voos

during the past five years. It might be hypothesized that these external
events have influenced the rate of author productivity in librarianship
over the past decade.
The Bradford distribution (or Law of Scatter) groups journals and
articles to identify the number of periodicals relevant to a particular
subject. Its computation is based on the total number of articles published by the journals in a particular subject area. A constant is then
computed for that subject area, which is used to determine the percentage of total coverage by various numbers of journals in a field. One
formula for this is:
R(n) = N log n/s (1 In 5 N)
where
R(n) = total number of journal articles
N = total number of journals
s = a constant (specific to a subject area).27
For example, Brookes applies this formula to a scientific literature
which yielded a total of 2000 articles from 400 journals. The results
indicate that 40 percent of the articles are contained in 5 percent of the
journals. Further, 80 percent of the articles arecontained in 37 percent of
the journals.28A core of journals is thus identified which could be used
to select the essential journals for a special collection.
Originally, Bradford had studied articles and journals to improve
abstracting services. He was concerned about the statistical distribution
he identified, and Fairthorne reports on this: Though in public and,
rather ambiguously, in private Bradford tended to belittle this finding,
he did make use of it. His private conversations gave me the impression
that he was sure ...that he had not enough evidence or explanation to
sustain i t in public debate.% Others have since affirmed that there is
enough evidence to support Bradfords statistical distribution and to
link it to a general bibliometric distribution.m Brookes cites numerous
uses of a Bradford bibliograph: items borrowed from a library, users
ranked by number of items they borrow, number of items cited (using a
nonrestrictive Bradford-Zipf distribution), and the index terms assigned
to document^.^^ These uses of a Bradford distribution have value for
library decision-making, since the distribution allows for the prediction
of regularity in a variety of events. Knowledge of sources and their items
(i,e., the Bradford formula) permits prediction of core collections, core
users and core index terms. However, explanation is lacking which
would give theoretical import to Bradfords statistical distribution.
14

LIBRARY TRENDS

Laws, Theory and Bibliometrics


Why, for example, do a relatively small number of journals represent
the core for any given field? Is this due to human limits in handling
certain quantities of information? Are many articles published to
increase an authors productivity with little concern that the article be
cited (or even read)?
Bradfords distribution was made more general by grouping journals according to the number of citations they receive. Using his citation
indexing data base, Garfield claimed: I can with confidence generalize
Bradfords bibliographical law concerning the concentration and dispersion of the literature of individual disciplines and specialities.
Going beyond Bradfords studies, I can say that a combination of the
literature of individual disciplines and specialities produces a multidisciplinary core for all of science comprising no more than 1000 journal~.~
Garfield

then identifies many variables besides scientific merit


which might contribute to high citation frequency. It would be through
the systematic study of these variables (authorsreputation, circulation,
number of articles published, library holding, etc.) that reasons might
emerge to explain why one journal receives numerous citations while
another receives very few. A similar analysis can be applied to the core
users of a library. It is not enough to predict the number of core users and
their amount of use; instead, the characteristics that make an individual
a core user need to be identified. Do some individuals have a reading
habit analogous to a physical addiction?Are the backgrounds of these
individuals similar, and are their other information behaviors similar?
Finally, it is likely that the Bradford distribution is susceptible to
change. Swanson has proposed a new model for journal articles, and he
advocates that authors state the reasons for citing each reference.%If
Swansons prototype were implemented, it might produce drastic
changes in citation patterns.
All of this points to the need for a more rigorous definition of the
bibliometric problem. The analyses of bibliographic information
should culminate in a causal model that accounts for variabilities in
such phenomena as author productivity and journal citation patterns.
The line between explanation and prediction can often be confused. For
example, the movement of the sun was once explained by the god Helios
riding a golden chariot across the sky. Later, it was hypothesized that the
sun revolved around the earth. This theory did allow for accurate
predictions; for example, the Gregorian calendar was based on the
theory that the sun revolved around the earth, yet the calendar errs by
only one day every 3323 years. Prediction accuracy is important but it
may be an artifact of empirical regularity. A bibliometric theory-if it is
SUMMER

1981

15

DANIEL OCONNOR

&

HENRY

voos

to be useful-must give equal emphasis to its explanatory power and its


prediction accuracy.

Bibliometric Concepts and Theory Construction


There is a wide range of bibliometric concerns beyond author
productivity or journal citation patterns, and these varied interests may
create problems in the development of a unified theory. This will be
examined in more detail after related bibliometric topics are identified.
One area often included in bibliometric reviews is Zipfs law. It isa
statistical distribution based on a hyperbolic curve which states that, if
words are ranked according to their frequency of occurrence (f), the nth
ranking word will appear approximately k i n times where k is a constant, or f(n)= k/n.34 Zipfs law has much potential for the descriptive
evaluation of subject authority files and related aspects of indexing.
Other major areas of interest which could fall within bibliometrics
include the half-life rates to assess the currency of a literature and impact
factors to evaluate the importance of journals. Burton and Kebler studied the half-life of different scientific literatures to identify the obsolescence rate of references in journal articles.% For example, physics
literature has a half-life of 4.6 years (i.e., one-half of all references in
journal articles were dated within the last 4.6 years), while chemistry has
a half-life of 8.1 years. Another view of obsolescence is to relate it to the
growth of a literature: the faster the rate of growth, the less is the scatter
and the more rapid the obsolescence.36Closely related to half-life is
Prices index to assess the hardness of j o ~ r n a l s . ~
Those

journals with
very recent references are considered to be at the research front as a hard
science. Those journals with references to more retrospective materials
are considered less hard, less scientific. For example, physics journals
contain the highest percentage of references to materials published in
the past five years (over 60 percent), while some English literature
journals only have 10 percent of their references dated in the past five
years.
Garfield developed a journals impact factor as the number of
citations a journal receives divided by the number of articles published
in a given time period.3s Narin developed influence weights as the total
number of citations to a journal divided by the total number of references from a journal (excluding self-reference and ~elf-citation).~
Although these measures are used to evaluate journals, they can also be
extended to evaluate authors by the number of citations individuals
receive. Meadows gives an account of the uses of such citations to assess
an authors reputation and importance.&
16

LIBRARY TRENDS

Laws, Theory and Bibliometrics


These various measures employ different units of analysis, and this
creates a problem of generality across bibliometric studies. McGrath
gives an excellent treatment of the unit of analysis problem as it relates
to collection d e ~ e l o p m e n tHe
. ~ ~distinguishes among the objects studied (i.e., the unit of analysis), the attributes of those objects (i.e., the
variables), and the appropriate levels of theoretical generality. These
distinctions are applicable to the bibliometric problem. For example, if
author productivity is the area under investigation, then authors are the
unit of analysis and their publications are the dependent variable. The
explanatory or independent variables would be those that influence an
author to contribute to the publication process (as discussed earlier in
relation to the Lotka distribution). This same unit of analysisauthors-would be used in investigations of author citation rates to
assess the significance of an individuals contributions. The number of
times an author is cited or the authors average number of citations per
journal article might serve as the dependent variable. The independent
variables could come from measures of collegial support, number of
professional papers delivered at meetings, individuals influence on
students, and the individuals personal characteristics. Author productivity and author importance could be investigated in the same study
because they share the same unit of analysis. However, this is not true for
the other areas of bibliometrics.
Journal citation patterns shift the unit of analysis from individuals
to journals. The dependent measure might be currency of references or
number of citations the journal receives from other publications. The
independent variables could encompass the journals refereeing process, manuscript acceptance rate, number of articles the journal publishes, some rating of the journals prestige, and number of library or
individual subscriptions. Of course, numerous independent variables
could be posited to expiain the number of citations a journal receives.
But this unit of analysis-the journal-changes if the Zipf distribution
is under investigation.
Zipfs law drops the unit of analysis to the word. A dependent
measure might be the frequency of the word and the independent
variables could include measures on the fundamental structure of language. Other explanatory variables might be the various principles
associated with vocabulary control or the structure of indexing terms.
These independent variables are subject to manipulation to determine
the effect they may have on word frequencies. Thus, bibliometrics spans
three major units of analysis: authors, journals and words. There is a
fourth unit-subject or discipline-not covered here, but i t is implied in
the work of those who distinguish the differences across fields or disciSUMMER 1981

17

DANIEL OCONNOR

&

HENRY

voos

plines (e.g., the behaviors of the literatures associated with the humanities versus the literature of the social sciences versus the science^).^'
Much of this research has focused on the literatures of the scientific
disciplines.
Since independent variables are grouped into conceptual areas the
interrelationships of which become the theory, the unit of analysis is
critical to the generality of the results. It is unlikely that research results
would ever be generalized beyond the unit of analysis. It could prove
impossible to generalize a common theory from studies of individuals
and studies of journals. At best, two middle-range theories might be
developed which could suggest hypotheses for a single, third area of
investigation. This hope of a unified theory has plagued other professions, and it is doubtful that bibliometrics can surpass the barrier
created by multiple units of analysis. Instead, it might be more productive to split the ill-defined field of bibliometrics into separate components where the unit of analysis is consistent and results can be
generalized across studies.
The various bibliometric models proposed here will need to pay
close attention to the issue of external validity. The models need to be
more than explanatory (i.e., explaining a large proportion of the variability in the dependent measure); indeed, the models will have to prove
their worth by making actual predictions using new cases. This allows
for the importance (or weight) of each variable in the model to be tested
in a rigorous manner. It provides proof that the theory works with new
data in real situations. It also assures that hypothesized nonlinear
relationships among the independent variables do, in fact, contribute to
explaining the variability in the dependent measures.
Finally, bibliometrics has much to offer the library and information field. The work of the past-by Lotka, Bradford and Zipf-is
valuable in helping librarians assess patterns of authorship (for cataloging rule changes), identifying core collections (for collection management), and designing better retrieval systems (for authority control).
However, the continued emphasis on the similarities of the bibliometric
statistical distributions is not regarded here as a fruitful endeavor. The
long-term benefits of bibliometrics will begin to emerge when attention
is directed toward causal explanations of bibliographic phenomena. At
that point, bibliometrics will again offer practical benefits to libraries.

18

LIBRARY TRENDS

Laws, Theory and Bibliometrics

References
1. Pritchard, Alan. Statistical Bibliography or Bibliometrics? Journal of
Documentation 25(Dec. 1969):348.
2.
. Computers, Statistical Bibliography and Abstracting Services,
1968. (unpublished);and Raising, L.M. Statistical Bibliography in Health Sciences.
Bulletin of the Medical Library Association 5O(July 1962):450, 461. Cited an Pritchard,
Statistical Bibliography, p. 349.
3. Simon, Herbert R. Why Analyze Bibliographies? Library Trends
22(July 1973):3-8;and Nicholas, David, and Ritchie, Maureen. Literature and Bibliometrics. London: Clive Bingley, 1978.
4. Hjerppe, Roland. A Bibliography of Bibliometrics and Citation Indexing and
Analysis. Stockholm: Royal Institute of Technology Library, 1980.
5. Pritchard, Alan, and Wittig, Glen. Bibliometrics: A Bibliography and Index
(1874-1959).vol. 1, Watford, Eng.: ALLM Books. in press.
6. Fairthorne. Robert A. Empirical Hypberbolic Distributions (Bradford-ZipfMandelbrot) for Bibliometric Description and Prediction. Journal of Documentation
25(Dec. 1969):319-43;Price, Derek de Solla. A General Theory of Ribliomeuic and Other
Cumulative Advantage Processes. Journal of the ASIS 27(Sept.-Oct. 1976):292-306;and
Bookstein, Abraham. The Bibliometric Distributions. Library Quarterly 46(0ct.
1976):416-23.
7. Narin, Francis, and Moll, Joy K. Bibliometrics. In AnnualReuiew oflnformation Science and Technology, edited by Martha E. Williams, p. 45. Vol. 12. Washington,
D.C.: American Society for Information Science, 1977.
8. Fairthorne, Empirical Hypberbolic Distributions, p. 322.
9. Rapoport, Anatol. Rank-Size Relations. In International Encyclopedia of
Statistics, edited by William Kruskal and Judith Tanur, p. 851. New York: Free Press,
1978.
10. Ibid., pp. 847-54: Price, A General Theory; and Bookstein, Bibliometric
Distributions.
11. Fairthorne, Empirical Hyperbolic Distributions, p. 321.
12. Carnap, Rudolf. Philosophical Foundations of Physics. New York: Basic Books,
1966, p. 228.
13. Ibid., p. 230.
14. Fairthorne, Empirical Hyperbolic Distributions, p. 332.
15. Price, A General Theory.
16. R a p p o r t , Rank-Size Relations, p. 851.
17. Ibid., p. 853.
18. Hill, Bruce M. Zipfs Law and Prior Distributions for the Composition of a
Population. Journal of the American Statistical Association 65(Sept. 1970):1230.
19. Line, Maurice B., and Sandison, Alexander. Practical Interpretation of Citation
and Library Use Studies. College 6 Research Libraries 36(Sept. 1975):393-96;and Line,
Maurice B. Rank Lists Based on Citations and Library Uses as Indicators of Journal
Usage in Individual Libraries. Collection Management 2(Winter 1978):313-16.
20. Broadus, Robert N. The Applications of Citation Analyses to Library Collection Building. Advances in Librarianship, vol. 7, edited by Melvin J. Voigt and Michael
H. Harris, pp. 299-335, New York: Academic Press, 1977.
21. See Moll, Joy K. Bibliometrics in Library Collection Management: Preface to
the Special Issue on Bibliomeuics. Collection Management 2(Fall 1978):195-98.
22. Hill, Bruce M. The Rank-Frequency Form of Zipfs Law. Journal of the
American Statistical Association 69(Dec. 1974):1025.
23. Narin and Moll, Bibliometrics, p. 46.

SUMMER

1981

19

DANIEL OCONNOR

&

HENRY

voos

24. Merton. Robert K. The Sociology of Science: Theoretical and Empirical Investigations. Chicago: IJniversity of Chicago Press, 1973; and Zuckerman, Harriet. Scientific
Elite: Nobel Laureates in the United States. New York: Free Press, 1977.
25. Lindsey, Duncan. The Scientific Publication System in SocialScience. San Francisco: Jossey-Bass, 1978, p. 89.
26. Mischel, Walter. Toward a Cognitive Social Learning Reconceptualization of
Personality, Psychological Review 8O(July 1973):252-83.
27. Brookes, B.C. Numerical Methods of Bibliographic Analysis.Library Trends
22(July 1973):26.
28. Ibid., p. 27.
29. Fairthorne, Empirical Hyperbolic Distributions, p. 333.
30. Price, A General Theory; and Bookstein, Bibliomeuic Distributions.
31. Brookes, Numerical Methods of Bibliographic Analysis.
32. Garfield, Eugene. Citation Analysis as a Tool in Journal Evaluation. Science
178(3 Nov. 1972):476.
33. Swanson, Don R. Information Retrieval as a Trial-and-Error Process, Library
Quarterly 47(ApriI 1977):128-48.
34. Narin and Moll, Bibliometrics, p. 46.
35. Burton, Robert E., and Kebler, R.W. The Half-Life of Some Scientific and
Technical Literatures. American Documentation 1](Jan. 1960):18-22.
36. Brookes, Numerical Methods of Bibliographic Analysis, p. 34.
37. Price, Derek de Solla. Citation Measures of Hard Science, Soft Science,
Technology, and Nonscience. In Communication Among Scientists and Engineers,
edited by Carnot E. Nelson and Donald K. Pollock, pp. 3-22. Lexington, Mass.: Heath
Lexington Books, 1970.
38. Garfield, Citation Analysis, pp. 471 -79.
39. Narin, Francis. Evaluative Bibliometrics: The Use of Publication and Citation
Analysis in the Evaluation of Scientific Activity. Cherry Hill, N. J.: Computer Horizons,
1976. (PB 252 399)
40. Meadows, Arthur J. Communication in Science. London: Butterworths. 1974.
41. McGrath, William E. Circulation Studies and Collection Development. Collection Development in Libraries, edited by Robert D. Stueart and George B. Miller, Jr.,
pp. 373-403. Greenwich, Conn.: JAI Press, 1980.
42. Iindsey, Scientific Publication System; Prim, Citation Measures of Hard Science:
and Garvey, William D. Communication: The Essence of Science. Oxford: Pergamon
Press, 1979.

LIBRARY TRENDS

Lotkas Law Revisited


WILLIAM GRAY P O I T E R

Introduction

THE
ORIGINAL STATEMENT of what has come to be known as Lotkas law
was made in Lotkas 1926journal article, The Frequency Distribution
of Scientific Productivity: ...the number (of authors) makingn contributions is about l/n2 of those making one; and the proportion of all
contributors, that make a single contribution, is about 60percent. To
derive his inverse square law, Lotka used comprehensive bibliographies in chemistry and physics and plotted the percentage of authors
making 1, 2, 3,...n contributions against the number of contributions
with both variables on a lo<garithmicscale. He then used the leastsquares method to calculate the slope of the line that best fit the plotted
data, and he found that the slope was approximately -2.
Since the publication of Lotkas original article in 1926, much
research has been done on author productivity in various subject fields.
The publications arising from this research have come to be associated
with Lotkas work and are often cited as proving or supporting his
findings. However, a review of this literature reveals that Lotkas article
was not cited until 1941, that his distribution was not termed Lotkas
law until 1949,and that noattempts were made to test the applicability
of Lotkas law to other disciplines until 1973. The present article will
discuss the literature that has become associated with Lotkas law and
will attempt to identify the important factors of Lotkas original methodology which should be considered when attempting to test the
applicability of Lotkas law.
William Gray Potter is Acquisitions Librarian, University of Illinois Library at lirbanaChampaign.
SUMMER

1981

21

WILLIAM POlTER

Applying Lotkas Law


Russell C. Coile in 1977 admonished investigators who, studying
the applicability of Lotkas law to the humanities and to map librarianship, may have misinterpreted Lotkas law and concluded erroneously that the law applies to these fields. In acogent exposition, Coile
detailed the derivation of Lotkas law in Lotkas original article. He
then proceeded to test the applicability of Lotkas law to data from
Murphys 1973 study of the humanities3 and Schorrs 1975 study of map
librarianship4using the Kolmogorov-Smirnov statistic. In both cases, i t
was found that, contrary to the authors claim, Lotkas law did not
apply to the observed data. Coile attributes Lotkas erroneous conclusion to a misinterpretation of Lotkas formulation, to the inclusion of
coauthors (whereas Lotka counted only the senior author), and to the
failure to use an appropriate statistical test of significance. Schorr also
counted coauthors and then used the chi-square test to determine if
Lotkas law held. Code contends that the chi-square test is not an
appropriate test in this case because the table entries for authors with
five to nine contributions show fewer than five observations.
The reason these data do not fit Lotkas law may be simply that
Lotkas law does not apply in the fields studied. However, the scope of
the studies by Murphy and Schorr does not apppear to be comparable to
that of Lotkas work. Lotka drew 6891 names from the 1907-16Decennial Index to Chemical Abstracts5 and 1325 names from Auerbachs
Geschichtstafeln der Physik, which included outstanding contributions
in physics throughout history up to 1900.6 Murphy took 170 authors
drawn from the first decade of Technology and Culture. Schorr used 326
authors publishing between 1921 and 1973on map librarianship based
on a bibliography he had compiled earlier. The bibliographic sources
used by Murphy and Schorr do not approach the coverage, in terms of
either subjects or time, of the sources used by Lotka. The same objections can also be applied to Schorrs 1974 study of library science7and
Vooss 1974 study of information science.
In order to test the applicability of Lotkas law to a set of data, a
statistical test is needed. Coile recommends the Kolmogorov-Smirnov
(K-S) statistic. The K-S test detemined the maximum deviation, D:
D=MaxI Fo(X-S,(X))I
where F,(X) is the theoretical cumulative frequency function and S ( X )
is the observed cumulative frequency function of a sample of n observations. At a 0.01 level of significance, the K-S statistic is equal to 1.63/n2.

22

LIBRARY TRENDS

Lot kas Law


If D is greater than the K-S statistic, then the sample distribution does
not fit the theoretical distribution.
The K-S statistic was used here to test the fit of Lotkas data to the
law that now bears his name. Using Lotkas law as the theoretical
distribution and the data from Lotkas study of Chemical Abstracts and
Auerbachs Geschichtstafeln der Physik as the observed data, it was
found that a portion of Lotkas data does not fit his law. As shown in
table 1, D from the Chemical Abstracts data is 0.0287, and the K-S
statistic is 1.63/&%ior
0.0195. The value of D is greater, and therefore
Lotkas law does not apply to Lotkas sample from Chemical Abstracts.
With the Auerbach figures, D is 0.0253 and the K-S statistic is 1.63/
J m o r 0.0448 (see table 2). The value of D is less, and therefore Lotkas
law does apply to Lotkas figures from Auerbachs Geschichtstafeln der
Physik. Lotkas law, then, applies to only a portion of his data.

TABLE 1
LOTKA,
Chemical Abstracts DATA
PROPORTION
OF AUTHORS
NO.

Contributions

0bsewt-d

SdXi

Expected

FdX)

IFdXX)- Sdx) I

0.5792
0.1537
0.0715
0.0416
0.0267
0.0190
0.0164
0.0123
0.0093
0.0094

0.5792
0.7329
0.8044
0.8460
0.8727
0.8917
0.9081
0.9204
0.9297
0.9391

0.6079
0.1520
0.0675
0.0380
0.0243
0.0169
0.0124
0.0095
0.0075
0.0061

0.6079
0.7599
0.8274
0.8654
0.8897
0.9066
0.9190
0.9285
0.9360
0.9421

0.0287
0.0270
0.0230
0.0194
0.0170
0.0149
0.0109
0.0081
0.0063
0.0030

2
3

D =Max (F,(X) - Sn(Xj =0.0287


At 0.01 level of significance, K-S statistic = 1 . 6 3 / a = 0.0195
D > 0.0195
Therefore. data from Chemical Abstracts do not fit Lotkas law.

It should be stressed that Lotkas inverse square law is a general,


theoretical estimate of productivity. The appeal of a hard and fast
distribution cannot be denied. However, Lotkas law is not a precise
statistical distribution. Rather, it is a generalization based upon two
samples.
SUMMER

1981

23

WILLIAM POTTER

TABLE 2
LOTKA,AUERBACH
DATA
PROPORTION
OF AUTHORS

2
3
4

5
6

7
8
9
10

0.5917
0.1540
0.0958
0.0377
0.0249
0.021 1
0.0143
0.0143
0.0045
0.0053

0.5917
0.7457
0.8415
0.8792
0.9041
0.9252
0.9395
0.9538
0.9583
0.9636

0.6079
0.1520
0.0675
0.0380
0.0243
0.0169
0.0124
0.0095
0.0075
0.0061

D =Max IF, (X) - SX) I = 0.0255


At 0.01 level of significance, K-S statistic = I . 6 3 / m
D < 0.0448
Therefore, the Auerback data fit Lotkas law.

0.6079
0.7599
0.8274
0.8654
0.8897
0.9066
0.9190
0.9285
0.9360
0.9421

0.0162
0.0142
0.0141
0.0138
0.0144
0.0186
0.0205
0.0253
0.0223
0.0215

= 0.0448

Given Coiles analysis of the work of Murphy and Schorr, and


given that even Lotkas data do not exactly f i t his inverse square law, it
would be useful to examine the literature o n a n d associated with Lotkas
law. Coile emphasizes that for statistical comparisons to be made to
Lotkas work, Lotkas methodology should be followed. This leads to
the problem of identifying which of the factors of Lotkas methodology
are most significant. In the following review of the literature, a n
attempt is made to identify these factors.

Literature of Lotkas Law


Many discussions of Lotkas law begin with a statement to the effect
that the distribution has previously been shown to hold i n various
subject fields. Turkeli, Krisciunas, Hubert, and Allison and Stewart are
example^.^ To quote from some of these authors:
It (Lotkaslaw) has been shown to hold for the productivity patterns
of chemists, physicists, mathematicians, and econometricians.

The productivity of scientists has been a subject of inquiry ever since


the pioneering investigation of Lotka, and others have since carried
out Loths type of investigation.

24

LIBRARY TRENDS

Lotkas L a w
Lotkas inverse square law of scientific productivity has since been
shown to fit data drawn from several widely varying time periods and
disciplines.

While some of these studies do not cite sources, those that do often cite
Derek de Solla Prices Little Science, Big Scien~e.~
Those that go
beyond Price cite Dresden, Dufrenoy, Davis, Williams, Zipf, Leavens,
and Simon.14 Several authors, following Prices lead, have assumed
Lotkas law to have been proved and have proceeded to discuss why the
distribution occurs, i.e., why some authors produce more or less than
others. These include later works by Price, Bookstein, Allison et al., and
Sh0ck1ey.l~These efforts to explain and refine Lotkas formulation are
interesting and valuable. In looking at the work of these authors,
however, it appears that some misunderstanding has developed, for, in
fact, most of the studies cited as demonstrating Lotkas law do not
mention Lotka and do not offer comparable data.
Dresden is the earliest author cited in relation to Lotkas law.16
Although Hubert refers to Dresdensarticle as subsequent to Lotkas
work, it did, in fact, appear in 1922. Dresden lists authors who presented papers at the regular meetings of the Chicago section of the
American Mathematical Society (AMS). While Dresden does mention
that 59 percent of the papers were later published, he is not concerned
with the publishing behavior of the authors involved. Hubert claims
that Dresden studied the output of American mathematicians. Actually, the authors studied were members of a regional section of AMS.
Dresdens purpose is to provide a record of the work of the Chicago
section of the AMS, not to make a generalization about the productivity
of mathematicians. To do so from Dresdens figures would be misleading, because the Chicago section of the AMS may not be representative
of all mathematicians, and because the figures apply to presented papers, not publications. Dresdens work is interesting, but its relation to
Lotkas law is questionable.
Dufrenoy attempted to study the publishing behavior of biologists
by anlayzing the index to the Review of Applied Mycology for 1932,
1934 and 1935, and papers published in volumes 115, 118 and 120 of
Comptes Rendus d e la Sociitk de Biologie (1932, 1934, 1935). He is
interested in the publishing behavior of biologists on an annual basis,
not in the rate of productivity over time as Lotka is. Dufrenoy does not
even cite Lotka, let alone attempt to apply Lotkas inverse square to his
data.
Davis in 194119is the first author to cite Lotka in the fifteen years
following Lotkas original article. He also used Dresdens data, thus
SUMMER

1981

25

WILLIAM POlTER

linking the two authors. Davis was interested in presentingdata to show


that the distribution of individuals in one of a variety of endeavors
would approximate a Pareto distribution when the measure of that
endeavor is sufficiently large. The ability to publish is one such ecdeavor. Another example used by Davis plots the billiards scores of seventynine faculty members at Indiana University. Davis plots the data from
Lotka and Dresden and finds that they resemble the Pareto distribution,
although the slope of their data iscloser to -2 than to the expected Pareto
exponent of -1.5. No statistical tests for goodness of fit areapplied. Davis
offers no new data on author productivity and is not concerned that
Dresden is describing papers presented at meetings, while Lotka is
describing published articles. He does provide a valuable service by
citing both Dresden and Lotka for the benefit of later researchers.
(Incidentally, the slope for the plotted billiards scores is -1.867.)
Williams uses Dufrenoysdata from the Review of Applied Mycology for 1935 and compiles his own figures from volume 1 (1913) and
volume 24 (1936) of the Review of A p p l i e d Entomology. As with
Dufrenoy, Williams analyzes publishing behavior of authors in individual years of individual journals and does not discuss the rates of author
productivity over time. Williams also does not cite Lotka and does not
appear to be familiar with Lotkas work.
In H u m a n Behavior and the Principle of Least Effort, Zipf has a
chapter titled The Distribution of Economic Power and Prestige. Zipf
discusses the authorship of scientific articles as an indication of prestige
and cites Lotka, Dresden and Davis. Zipf is the first to call the inverse
square rule Lotkas law and discusses i t as an approximation, not a
rigid distribution. Accepting Lotkas formulation and Daviss interpretation of Dresden, Zipf also speculates on why some authors publish
more than others.21No new data are presented and no statistical tests are
made of the available data of Dresden and Lotka.
Leavens in 1953 based his study of econometricians on the work of
721 authors who presented papers at meetings of the Econometric
Society or had articles published in the first twenty volumes of Econometrica (1933-52).He does not cite or mention Lotka. While his data
cover an extensive period of time, they represent only one journal in a
relatively small field compared to Lotkas study of physics and chemistry. Leavens counts unpublished papers read at meetings and counts all
authors where Lotka counted only the senior author. Still, using the K-S
test, Leavenss data do f i t Lotkas law (see tables 3 and 4).22The major
factor that Lmka and Leavens have in common is that both of their
studies cover a substantial period of time.

26

LIBRARY TRENDS

Lotkas L a w

TABLE 3
LEAVENS,
PAPERSPRESENTEDAT MEETINGS
OF THE
ECONOMETRICS
SOCIETY
OR I N Econometrica, 1933-52
No.
Contributions
1
2

No.

Contributors

Contributors

436
107
61
40
14
23
6

3
4
5
6

7
8

9
11
12
13
14
16
17

11
1

4
2
3
2
1
2

18

23
24
28
30
37
46

1
1

TOTAL

2
1
1
1

721

60.47
14.84
8.46
5.55
1.94
3.19
0.83
1.53
0.14
0.55
0.28
0.42
0.28
0.14
0.28
0.14
0.14
0.14
0.28
0.14
0.14
0.14
100.00

Total N o .
Contributions
436
214
183
160
70
138
42
88
9
44

24

39
28
16
34
18
23
24
56
30
37
46
1,759

Simon, in an article appearing in Biometrika in 1955and reprinted


in his Models of Man in 1957, cites Davis and Leavens.23In observing
how these and other data culled from many sources and involving word
frequencies, city sizes and income distribution fit the Yule distribution,
Simon uses the figures compiled by b t k a and Dresden, but cites neither
writer directly and does not mention Lotka. Rather, he provides a
reference to Davis. Lotka is listed in the index to Models of Man, but for
an article on a different topic. Establishing a theoretical distribution for
the data from Lotka, Dresden and Leavens, Simon claims that the fit is
reasonably good without applying any statistical tests. As with Davis
and Zipf, Simon offers no new data and does not attempt to find
statistical support for what has become known as Lotkas law.
In 1963, Prices Little Science, Big Science appeared. Price claims
that Loth and several others have shown that whenever data are drawn
from an index extending: over a number of years sufficient to enable
SUMMER

1981

27

WILLIAM POTTER

TABLE 4
LEAVENS
PROPORTION
OF AUTHORS
No.

Contributions

IFdXj - S d X ) I

Expected

Observed
~

0.6047
0.1484
0.0846
0.0555
0.0194
0.0319
0.0083

8
9

0.0014

2
3
4
5
6

0.0153

0.6047
0.7531
0.8377
0.8932
0.9126
0.9445
0.9528
0.9681
0.9695

0.6079
0.1520
0.0675
0.0380
0.0243
0.0169
0.0124
0.0095
0.0075

n =721
D = Max IFo(X)- &,(XI =0.0396
At the 0.01 level of significanre, K-S statistic = 1 . 6 3 / f i
D < 0.0607
Therefore. Lotkas law holds for Leavens data.

0.0032

0.6079
0.7599
0.8274
0.8654
0.8897
0.9066
0.9190
0.9285
0.9360

0.0068
0.0103
0.0278
0.0229
0.0379
0.0338
0.0396
0.0335

= 0.0607

those who can produce more than a couple of papers to do so,...the


~ ~discussed Lotkas
result ...is an inverse square law of p r o d u c t i ~ i t y . He
data from Chemical Abstracts and refers the reader to Simon for a fuller
analysis and justification. Price plots data from an analysis of the
abridged Philosophical Transactions of the Royal Society of London
for the seventeenth and early eighteenth centuries. He suggests that
these new data fit Lotkas law, but he does not provide the actual figures
or perform a statistical test for goodness of fit. Prices principal interest
is in discussing how to modify Lotkas law in order to account accurately for authors of high productivity, i.e., those who produce fifteen or
more papers. This refinement is necessary, Price says, since otherwise
the maximum scores of published papers in a lifetime would be thousands and even tens of thousands rather than the several hundreds that
seem to represent even the most prolific scientific lives.25The modification of Lotkas law is, as mentioned earlier, the subject of several
articles, notably those by Bookstein and by Allison et a1.26
In a 1969 review article, Fairthorne is the first to link the distributions of Bradford, Zipf, Mandelbrot, and Lotka. While he does not cite
Price, Fairthorne does mention that Lotkas relation underestimates
the number of more prolific authors but applies fairly well for the less
prolific. m Naranan and Bookstein also observe that many bibliometric
distributions are essentially the same.%
28

LIBRARY TRENDS

Lotkas Law
With the exception of Leavens, no new data fitting Lotkas law are
found in the above articles, and the figures from Leavens could be
suspect. Yet presumably these studies are the ones invoked as proof of
the applicability of Lotkas law by later authors, e.g., It has been
shown to hold for the productivity patterns of chemists, physicists,
mathematicians, and econometricians.29In point of fact, no published
article attempts to apply or test Lotkas law until Murphy in 1973. A
critique of Murphys article is provided by Chile and is described above;
Hubert also faults Murphy.30
After Murphy, the next published application of Lotkas law is
Voos in a 1974 study of information science. Taking his data from all
articles indexed in Information Science Abstracts for 1966-70, Voos
proposes that the inverse square law does not hold for information
science and that -3.5 is a better constant for this particular d i ~ c i p l i n e . ~ ~
The error Voos makes is pointed out by Coile in a subsequent letter to
the editor.32Voos lists the five years under study separately and then
simply adds the tabulations for the individual years to arrive at a total
for the five years: i.e., the number of authors publishing one paper in
1966, 1967, 1968, 1969,and 1970 were added together to arrive at a figure
for all authors publishing one paper. Thus, an author publishing one
paper per year would be credited with only one paper for the five years
and not five, as he should be. As Coile points out, Voos is studying
single years of data whereas Lotka studied a number of years. Like
Dufrenoy, Voos defines an important area for research in analyzing
author productivity on an annual basis.
Schorr has published three articles dealing with Lotkas law in
library science, history of legal
and map librarianship. The
faults of the last article are documented by Coile as described earlier.
The first article is similarly flawed because, as Tudor points out in a
Schorr uses only two journals, College
subsequent letter to the
c
h Research Libraries and Library Quarterly, for 1963-72. Schorr concludes that the data on the history of legal medicine do not fit Lotkas
law. Tudor terms Schorrs article a frivolous bagatelle, but it did
reawaken interest in Lotka. However, the choice of such a restricted
subject field consvasts sharply with Lotkas use of the topics of physics
and chemistry.
Rogge attempts to apply Lotkas law to the literature of anthropology. He cites Lotka and claims that Lotkas law has been tested
Using the 40-year cumulative index of the
positively many
American Anthropologist (1888- 1928)and the 30-yearcumulative index
of American Antiquity (1935-65), Rogge concludes that it was clear
SUMMER 1981

29

WILLIAM POTTER

that at least this portion of the anthropological literature was produced


in accordance with Lotkas law.36However, Rogge does not provide
the data or even a summary of his statistical findings. Even with data,
the study would cover only two periodicals and not the whole body of
literature in anthropology.
The most recent attempt to apply Lotkas law was made in 1979by
Radhakrishnan and Kernizan in the field of computer ~cience.~
These
authors studied papers published during 1968-72in Communications
of the Association for C o m p u t i n g Machinery ( C A C M ) and in the Journal of the ACM ( J A C M ) . The same objection applied to Schorrs and
Rogges articles applies here-data are drawn from two journals only.
The authors admit that this is a problem but contend that their finding
it noteworthy that, for a single journal, the fitted line will have a slope of
approximately -3. This is, of course, interesting, and might belinked to
Dufrenoys and Williamss studies of a single journal. In a second
experiment, the authors selected two random samples of three hundred
authors, one sample each from C A C M and J A C M , and checked these
authors in the cumulaive index to C o m p u t e r and Control Abstracts
covering 1969-72 to determine the number of publications per author.
They found that Lotkas law did not apply, but wisely caution against
drawing a negative conclusion about the satisfaction of Lotkas law
from this single e~periment.~
They go on to point out the need for a
large-scale test of Lotkas law using a large, comprehensive machinereadable file, such as Engineering Index. T o date, no such test has been
reported.
Perhaps the most ambitious work to date in the study of Lotkas law
has been done by Jan Vlachjl. In an article appearing in 1972, Vlachjl
observes the role of several variables which might influence how
appropriate Lotkas law is to a given set of data.39He examined bibliographies in many subject areas and listed the number of years covered by
each source, the number of papers and authors represented, and the
slope of the fitted line. While the data presented are interesting, Vlach?
does not attempt to test the applicability of Lotkas law, nor does he
provide sufficient data for others to perform statistical tests on his data.
In this and a later article, Vlachjl discusses how the slope of the fitted
line varies both according to the number of years covered and according
to Vlach+s division of the communities [of authors]...into universal,
national, [international,] and those in journals.41 Vlachjl is mainly
concerned with how these two variables affect the slope of the fitted line,
i.e., the exponent in Lotkas formulation, and not with the appropriateness of Lotkas law. He also evaluated earlier studies as follows: By

30

LIBRARY TRENDS

Lotkas Law
analyzing the results of the previous studies, however, it was found that
their scope and applicability is limited, since, first, their sampling
background does not go much beyond the original data brought by
b t k a and his early followers and, second, some basic concepts involved
in these studies are anticipated without ever being thoroughly investiVlach? also compiled A Bibliography of Lotkas Law and
Related P h e n ~ m e n a . This
~ ~ comprehensive bibliography lists works
of interest not only on Lotka but alsoon the related laws of Bradford and
Zipf, as well as bibliometrics and frequency distributions in general.
In a 1975 letter to the editor of the Journal ofDocurnentation, Coile
criticizes Kochens discussion of authorship in the latters Principles of
Information Retrieval.44In this letter, Coile offers some useful insights
into how the work of Leavens, Simon, Davis, and Dresden came to be
associated with L ~ t k a . ~ ~
Lotkas Law and Monograph Productivity
From this review of the literature, it can be argued that there have
been no studies that replicate Lotkas methodology closely enough to be
compared to Lotkas original work. Few of the authors of these studies
should be faulted for this, because until Murphyspaper in 1973,no one
attempted to compile new data to compare to Lotkas findings. Rather,
earlier work by Dresden, Dufrenoy, Davis, Williams, and Leavens
became associated with Lotkas work by subsequent authors and cited
by some as providing proof of Lotkas law. Murphy, Schorr, Voos, and
others in the 1970s sought to test Lotkas law in various disciplines, but
failed to match the conditions under which Lotka conducted his study,
usually because a suitable bibliographic source was not available.
Vlachjr identified two variables which influence the distribution of
author productivity: (1) the time period under study, and (2) the community of authors involved. None of the studies discussed above match
Lotkas study in both these variables. Lotkas study covered ten years for
the Chemical Abstracts figures, and all of history u p to 1900 for Auerbach. Those that do match or surpass Lotka in time period, notably
Rogge, do not match him in the selection of a community of authors. In
Lotkas study of Chemical Abstracts, the community consists of all
senior authors whose work was included in the 1907-16decennial index.
In his study of Auerbach, the community of authors consists of authors
of the most notable works in the field of physics u p to 1900. In most
studies of author productivity, it is usually the subject field that defines
a community of authors, because that is how journals and bibliograSUMMER

1981

31

WILLIAM POlTER

phies are organized and because researchers are often interested in


studying a particular field. Most subsequent studies single out one or
two journals or study only a few years. These works are often significant
in and of themselves, and contribute greatly to our understanding of
author productivity and behavior. However, they should not be compared to Lotkas work without much caution.
There have been two recent studies which might be comparable to
Lotkas work in terms of the time period and the community of authors.
However, both deal with monographic literature, not journal articles.
One is a study done by the Library of Congress (LC) of all author
headings on its MARC tapes.46The other is a study of personal authors
in the University of Illinois Library card ~atalog.~
Both studies differ
from Lotkas in that all authors, not just the senior authors, arecounted.
Lotka never discloses why he counted only senior authors. A look at the
first decennial index to Chemical Abstracts reveals a possible explanation. If an article has four or fewer authors, all authors are indexed.
However, the second, third and fourth authors will have only a see
reference to the first author, not to the number of articles written by the
authors together. Thus, to compile all authors, Lotka would have had
to refer to the first author. A quick sample shows that over 20 percent of
the author entries have see references. Considering that Lotka tabulated all authors whose surnames began with A or B , and that from 272
pages this resulted in 6891 authors, it is not surprising that he might
have balked at this added chore.
The data from the University of Illinois Library catalog are shown
in table 5. The Illinois catalog contains records for about 2.5 million
titles. A random sample of 2345 personal authors was drawn. Plotting
the first 29 observations on a log scale, the slope for the data is -2.0903,
very close to Lotkas theoretical slope. The K-S test in table 6 shows that
the Illinois data do indeed fit. It should be pointed out that the five most
prolific authors in the Illinois study are Shakespeare, Milton, Goethe,
Balzac, and Dickens. None of these authors write currently, but their
works continue to be published, a feature Lotka did not face.
The LC study of its MARC tapes covers 1,336,182machine-readable
catalog records established between 1969and 1979,with 695,074 unique
personal name headings. The results are shown in table 7. Plotting the
first 10 points, the slope of the data is -2.3450. Intuitively, this will not fit
Lotkas theoretical distribution. Applying the K-S test to the firstobservation, D is 0.656.5 - 0.6079 = 0.0486; the K-S statistic is 1.63/4-.
= 0.0020. The value of D is greater than the K-S statistic; therefore, the
data do not fit Lotkas law.

32

LIBRARY TRENDS

TABLE 5
IJNIVERSITY
OF ILLINOIS
L IBRARY
AT IJRBANA-CHAMPAIGN
STUDYOF PERSONAL
AUTHORS
I N THE CARD
CATALOG
No.

Works
1
2
3
4
5
6

7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
26
27
28
30
31
32
33
34
35
36
38
39
40
42
44
47
48
49
51
58
63
66

SUMMER

1981

No.
Authors

1,489
343
160

92
44
35
27
18
12
11
10
9
2
6
9
8
3
2
2
5
5
1
1

2
1
1

2
1
3
1
1
1
3
2
1
2

2
2
1
1

1
1
1
1
1

x
Total Sample
63.50
14.63
6.82
3.92
1.88
1.49
1.15
0.77
0.51
0.47
0.43
0.38
0.09
0.26
0.38
0.34
0.13
0.09
0.09
0.21
0.21
0.04
0.04
0.09
0.04
0.04
0.17
0.09
0.04
0.13
0.04
0.04
0.04
0.13
0.09
0.04
0.09
0.09
0.09
0.04
0.04
0.04
0.04
0.04
0.04
0.04

Total N o .
Entries

1,489

686

480

368

220

210

189

144

108

110

110

108

26

84

135

128

51

36

38

100

105

22

23

48

26

27

112

60

31

96

33

34

35

108

76

39

80

84

88

47

48

49

51

58

63

66

33

WILLIAM POTTER

TABLE 5-Continued
No.
Authors

NO.

Works

0.04
0.04
0.04
0.04
0.04

1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2,345

70
90
111
115
149
167
231
266
298
379
592
652
835
1,374
1,490
TOTALS

Total No.
Entries

Total Sample

70

90

111

115

149

167

23 1

266

298

379

592

652

835

1,374

1,490

13,148

0.04

0.04
0.04
0.04
0.04
0.04
0.04
0.04
0.04
0.04
100.00

TABLE 6

UNIVERSITY
OF ILLINOIS
LIBRARY

AT URBANA-CHAMPAIGN
PROPORTION OF AUTHORS

Titles/
Author

Theoretical
(Lotka)

FdXi

Observed
(Illinois)

S"Wi

1
2
3

0.6079
0.1520
0.0675
0.0380
0.0243
0.0169
0.0124
0.0095
0.0075

0.6079
0.7599
0.8274
0.8654
0.8897
0.9066
0.9190
0.9285
0.9360

0.6350
0.1463
0.0682
0.0392
0.0188
0.0149
0.0115
0.0077
0.0051

0.6350
0.7813
0.8495
0.8887
0.9075
0.9224
0.9339
0.9416
0.9467

5
6
7
8
9

IWXJ - S d X I

0.027 1
0.0214
0.022 1
0.0233
0.0178
0.0158
0.0 149
0.0131
0.0107

D = Max IFo(X)- %(XI =0.0271


At the 0.01 level of significance, K-S statistic = 1 . 6 3 / m = 0.0337

D < 0.0337
Therefore, UI Library data fit Lotka's law.

Why the LC figures do not fit, while the Illinois figures do, is open
to conjecture. One reason might be that the LC data include persons
occurring as subjects as well as authors. Another possible cause is that

34

LIBRARY TRENDS

Lotkas Law
TABLE 7
LIBRARY
OF CONGRESS
ANALYSIS
OF PERSONAL
NAMEHEADINGS
ON MARC TAPES
No.
Occurrences

No.
Distinct Headings

w
Distinct Headings

~~

1
2
3
4
5
6

7
8
9
10
11-13
14-20
21-50
51-100
101-200
201-300
301-400
401-500
501-1000
1001+
Total

456,328
119,681
46,247
23,951
13,820
8,790
5,827
4,056
2,998
2,153
4,116
3,748
2,678
448
149
47
19
11
5
2
695,074

65.65
17.22
6.65
3.45
1.99
1.26
0.84
0.58
0.43
0.31
0.59
0.54
0.39
0.06
0.02
0.01
0.00
0.00
0.00
0.00

99.99

the Illinois figures cover authors from the beginning of history to the
present, while LC figures cover catalog records established over ten
years. This could also be the reason Lotkas Auerbach figures fit, but not
the Chemical Abstracts data. In any event, the fact that an exact fit is
lacking in the Library of Congress figures is not as important as the
emergence of a general rule which implies that a sufficiently large
sample of a broad community of authors and a large time span will
approximate Lotkas law.
It is of further interest to note that both the LC and Illinois figures
were compiled for a practical management problem-planning for the
implementation of the second edition of the Anglo-American Cataloging Rules. It is not uncommon for other bibliometric formulations to be
used for practical planning, notably Bradfords distribution for planning periodical collections. This, however, is the first known case where
Lotkas law has been useful in planning.

SUMMER

1981

35

WILLIAM POTTER

Conclusion

It has been seen that Lotkas law fits only a portion of the data from
his 1926 study and that his most-cited figures, those for Chemical
Abstracts from 1907 to 1916, do not f i t his distribution. Later studies
assume that Lotkas law had been proven to apply in a variety of subject
areas, when in fact it had not. No data were compiled for the express
purpose of verifying the law until the 1970s, and these recent studies,
while valuable and useful, are not comparable to Lotkas study in terms
of the time period covered and the community of authors involved.
Recent studies of monograph productivity suggest that Lotkas law
might reflect an underlying pattern in the behavior of those people who
produce publications, whether those publications are books or journal
articles. It would appear that when the time period covered is ten years
or more and the community of authors is defined broadly, author
productivity approximates the frequenty distribution that Lotka
observed and that has become known as Lotkas law. If this is correct,
then there is a universal community of all authors who have ever
published whose pattern of productivity might approximate Lotkas
law. Within this universal community, there are many subcommunities
defined, as Vlachj. points out, by discipline, nation. institution, journal, etc. Even time could be used as a dimension to define a subcommunity. All studies ofauthor productivity are concerned with a subset of the
universal community of authors. The smaller the subset, the less likely
i t will be that the measurements of productivity reflect the measurements for the universal community, although these measurements may
be useful and valuable in studying that particular subset. However, the
larger and more representative the subset, the more closely it will
resemble the universal community. The subsets studied by Lotka and
those represented in the Library of Congress study of its MARC tapes
and in the study of the University of Illinois Library card catalog are the
largest yet ronsidered, and the similarity of their patterns of author
productivity and behavior suggest that broader patterns do indeed exist.
The above review of literature associated with Lotkas law suggests
several areas for future research. First, the work of Dufrenoy and others
on the annual productivity of authors points to an interesting measure
of author behavior. Second, Radhakrishnan and Kernizan make a convincing argument for the use of large-scale machine-readable data bases
in the study of author Productivity. They suggest that the machine
version of Engineering Index could be used, and this would be especially interesting in that Engineering Index is a multidisciplinary data
base with records that are well indexed. Thus, subsets could be defined

36

LIBRARY TRENDS

Lot kas Law


by a number of factors-subject, date, country, etc.-and the productivi t y of authors within these subsets could be determined and compared
relatively easily. Studies of large bibliographic data bases could also
lead to some standardization of methodology. Third, the concept
derived from Vlachj. of a universal community of authors needs to be
explored further. Given that such a universal community exists, and
that all studies of author productivity are based upon subsets, or subcommunities, of this universal community, then some work could be
done on which factors used to define the subsets are most importanti.e., time, subject, language, format of publication, etc. Finally, the use
of a univariate model like Lotkas law, where the response of one
variable to another is measured, may oversimplify the complex subject
of author productivity. The factors mentioned above that serve to define
communities of authors, as well as other factors, might be included as
variables in a more sophisticated model for measuring and predicting
author productivity. More complex models will be more difficult to
understand, but the inclusion of relevant variables in a multivariate
model may result in a model that better simulates reality and thus is
more useful.

References
1. Lotka, Alfred J. The Frequency Distribution of Scientific Productivity.
Journal of the Washington Academy of Sciences 16(19 Junr 1926):323.
2. Coile, Russell C. Lotkas Frequency Distribution of Scientific Productivity.
lournal of the ASZS 28(Nov. 1977):366.
3. Murphy, Iarry J. Lotkas Law in the Humanities? Journal of the ASZS
24(Nov.-Dec. 1973):461-62.
4. Schorr, Alan E. Lotkas Law and Map Librarianship. Journal of the ASIS
26(May-June 1975):189-90.
5. Decennial Index to Chemical Abstracts, vols. 1-10, 1907-1916. Easton, Pa.:
American Chemical Society, n.d.
6. Auerbach, Felix. Geschichtstafeln der Physik. Leipzig: Barb, 1910.
7. Schorr, Alan E. Lotkas Law and Library Science. R Q 14(Fall 1974):32-33.
8. Voos, Henry. Lotka and Information Science. Journal of the A S I S
25(July-Aug. 1974):270-72.
9. Turkeli, Arif. The Doctoral Training Environment and Post-Doctorate
Productivity Among Turkish Physicists. Science Studies 3(1978):311-18; Krisciunas,
Kevin. Letter to the editor in Journal of the ASZS 28(Jan. 1977):65-66; Hubert, John J.
Letter to the editor in Journal of the ASIS 28(Jan. 1977):66; and Allison, Paul D., and
Stewart, John A. Productivity Differences among Srientists: Evidence for Accumulative
Advantage. American Sociological Review 39(Aug. 1974):596-606.
10. Krisciunas, letter to the editor, p. 65.
11. Turkeli, Doctoral Training Environment, p. 31 1.
12. Allison and Stewart, Productivity Differences among Scientists, p. 596.
13. Price, Derek de Solla. Little Science, Big Science. New York: Columbia University Press, 1963.
SUMMER

1981

37

WILLIAM POTTER

14. Dresden, Arnold. A Report on the Scientific Work of the Chicago Section,
1897-1922.American Mathematical Society Bulletin 28(July 1922):303-07;Dufrenoy,
Jean. The Publishing Behavior of Biologists. Quarterly Review of Biology 13(June
1938):207-10;Davis, Harold T. The Analysis of Economic Time Series. Bloomington,
. Theories of Econometrics. Bloomington,
Ind.: Principia Press, 1941.Seealso
Ind.: Principia Press, 1941, pp. 45-50; Williams, C.B. The Numbers of Publications
Written by Biologists. Annals of Eugenics 12(1944):143-46;Zipf. George K. Human
Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Cambridge, Mass.: Addison-Wesley, 1949; Leavens, Dickson H. Letter to the editor in Econometrica 21(0ct. 1953):630-32;and Simon, Herbert A. On a Class of Skew Distribution
. Models of Man.
Functions. Bzometrika 42(Dec. 1955):425-40;reprinted in
New York: Wiley, 1957.
15. Price, Derek de Solla, and Gursey, S. Studies in Scientometrics. Part I. Transcience and Continuance in Scientific Authorship. International Forum on Information Documentation I ( 1976):17-24; Bookstein, Abraham. The Bibliometric
Distributions. Library Quarterly 46(0ct. 1976):416-23;Bookstein, Abraham. Patterns of
Scientific Productivity and Social Change. Journal of the ASIS 28(July 1977):206-10;
Allison, Paul D. et al. Lotkas Law: A Problem in Its Interpretation and Application.
Social Studies of Science 6(1976):269-76;and Shockley, William. On the Statistics of
Individual Variations of Productivity in Research Laboratories. Proceedings of the
Institute of Radio Engineers 45(March 1957):279-90.
16. Dresden, Report on the Scientific Work.
17. Hubert, letter to the editor, p. 66.
18. Dufrenoy, Publishing Behavior of Biologists.
19. Davis, Analysis of Economic Time Series.
20. Williams, Publications Written by Biologists.
21. Zipf, Human Behavior.
22. Leavens, letter to the editor.
23. Simon, On a Class of Skew Distribution Functions.
24. Price, Little Science, p. 43.
25. Ibid., pp. 48-49.
26. Bookstein, Patterns of Scientific Productivity; and Allison, et al., Lotkas Law.
27. Fairthorne, Robert A. Progress in Documentation. Journal of Documentation 25(Dec. 1969):325.
28. Naranan, S. Power Law Relations in Science Bibliography-A Self-Consistent
Interpretation. Journal of Documentation 27(June 1971):83-97;and Bookstein, Bibliometric Distributions.
29. Krisciunas, letter to the editor, pp. 65-66.
30. Coile, Lotkas Frequency Distribution; and Hubert, letter to the editor.
31. Voos, Lotka and Information Science.
32. Coile, Russell C. Letter to the editor in Journal of the ASIS %(MarchApril 1975):133.
33. Schorr, Alan E. Lotkas Law and the History of Legal Medicine. Research in
Librarianship SO(Sept. 1975):205-09.
34. Tudor, Dean. Letter to the editor in R Q 14(Winter 1974):187.
35. Rogge, A.E. A Look at Academic Anthropology. American Anthropologist
78(Dec. 1976):835.
36. Ibid.
37. Radhakrishnan, T., and Kennzan, R.Lotkas Law and Computer Science Literature. Journal of the ASIS 3O(Jan. 1979):51-54.
38. Ibid., p. 54.
39. Vlach?, Jan. Variable Factors in Scientific Communities (Observations on
Lotkas Law)/ Teorie a Metoda 4(1972):91-120.
. Time Factor in Lotkas Law. Probleme de Informare si
40.
Documentare 10( 1976):44-87.

38

LIBRARY TRENDS

Lotkas Law
41. Ibid.. p. 48.
42. Ibid., p. 46.

43.
, comp. Frequency Distribution of Scientific Performance: A
Bibliography of Lotkas Law and Related Phenomena. Scientometrics, Bibliography
Section 1(1978):109-30.
44. Chile, Russell C. Letter to the editor in Journal of Docurnentation 31(Dec.
1975):298-301;see also Kochen, Manfred. Principles of lnformation Retrieval.
Los Angeles: Melville, 1974.
45. Other works which cite and discuss Lotka to some extent include: Aiyepeku.
Wilson 0. The Productivity ol Geographical Authors: A Case Study from Nigeria.
Journal of Documentation 32(June 1976):105-17:Cole, Jonathan R., and Cole, Slephen.
The Ortega Hypothesis. Science 178(0ct. 1972):368-75; Mantell, Leroy H. On Laws of
Special Abilities and the Production of Scientific Literature. American Documentation
17(Jan. 1966):8-16;and Narin, Francis, et al. Eualuative Bibliometrics: The Use of Publzcation and Citation Analysis in the Evaluation of Scientific Activity. Cherry Hill, N.J.:
Computer Horizons, Inc., 1976. (CH Project No. 704R)
46. MrCallum, Sally fi.,andGodwin. James L. Statisticsof Headingsin the MARC
File. Network Development Offire, Library of Congress, unpublished paper, 5 Jan. 1981.
47. Potter, William G. When Names Collide: Conflict in theCata1ogandAAC:RZ.
Library Resources & Technical Services 24(Winter 1980):3-16.

SUMMER

1981

39

This Page Intentionally Left Blank

Bradfords Law: Theory, Empiricism and


the Gaps Between
-~

M. CARL DROIT

NATURAL
LAWS DESCRIBE PATTERNS which are regular and recurring. The
scientific point of a law is twofold. First, a concrete statement of a law
may give give us the ability to better predict events or to shape our
reactions to them. Second, a physical law may help in the development
of theories which explain why a particular pattern occurs. Natural laws
therefore are of interest because they offer the opportunity for empirical
application and for theoretical understanding. On the other hand, the
ability to articulate a law does not automatically guarantee either
empirical or theoretical advances.
Bradfords law begins with a regularity which is observed in the
retrieval or use of published information. Broadly speaking, this regularity is characterized by both concentration and dispersion of specific
items of information over different sources of information. Thus, for a
search on some specific topic, a large number of the relevant articles will
be concentrated in a small number of journal titles. The remaining
articles will be dispersed over a large number of titles. Throughout the
remaining discussion, journal articles will be used to represent the items
retrieved and journals will be the sources. This is in keeping with most
of the Bradfords law literature, although there is clear evidence that
similar patterns occur for other kinds of items and sources.
The literature on Bradtords law incorporates both theoretical and
empirical aspects. These aspects are each coherent and developingareas
of scientific inquiry. Confusion arises, however, when the two aspects
M. Carl Drott is Associate Professor,School of Library and Information Science, Drexel
University, Philadelphia.
SUMMER

1981

41

M. CARL D R O l T

become mixed. This mixing occurs in the normal course of scholarship.


Authors with empirical data quite properly speculate on what might be
implied in terms of theory. Writers developing theoretical models offer
empirical interpretations as a way of making the abstract more concrete.
It is important for readers and future researchers to separate clearly the
knowledge developed in each aspect from the many unanswered questions which separate theory from empiricism.
Theoretical Development
The fundamental question in the theoretical study of Bradfords
law is this: What is the nature of the underlying probabilistic events
which aggregate to create the regular pattern of dispersion of articles
over titles? As a first step toward solving this difficult (and as yet
unsolved) problem, it is necessary to have a mathematical description of
the pattern whose appearance we are trying to explain. The first statement of this mathematical formula came from S.C. Bradford. He
examined all of the journal titles contributing to a bibliography on
applied geophysics. Bradford discovered that he could divide the titles
into three groups, such that each group of titles contributed about the
same number of articles. Starting with the titles which contributed the
most articles, he divided the articles into three roughly equal groups:
The first 9 titles contributed 429 articles.
The next 59 titles contributed 499 articles.
The last 258 titles contributed 404 articles.
The value of this arrangement lies in the number of titles it takes for
each one-third of the articles. In this case, Bradford discovered a regularity in calculating the number of titles in each of the three groups:
9 titles
9 X 5 titles (equals 45 titles)
9 X 5 X 5 titles (equals 225 titles)
Just as the three groups of articles were not quite equal in size, this
formulation does not quite give the observed number of titles. This
arrangement does have a very special regularity. There is a core of
nine titles which contributes one-third of all the articles. In order to get
the second third of the articles (that is, to add the same number of articles
already found), one needs to search five times as many titles (5 X 9). To
find the last third of the articles (again, to add the same number of
articles as found in the core titles), one must search five times again (9

42

LIBRARY TRENDS

Bradfords Law
X 5 X 5) as many titles. Thus, to show title groups contributing an equal
number of articles, one could write:

9 : 9 x 5 : 9 x 52
Recognizing that the size of the core (9)and the multiplier ( 5 )might be
different for other searches, we divide the groups by nine and replace the
multiplier with a variable. This gives groups of titles with sizes:
1 : a : a2
where each of the three groups of titles contributes the same number of
articles.
This is the first theoretical statement of Bradfords law. Note that
while it was founded on empirical observation, it is not derived strictly
from the data. (As noted above, the data do not quite fit the law either in
the exact number of articles in each group or in matching the calculated
number of titles to theobservednumber.)Asastatementof
a natural law
this formulation has several shortcomings. The most serious problem is
that the phenomenon is described in terms of groups of journals. These
rather large aggregations of titles seem to be an artifact of the statement
of the law. That is, i t appears that the dispersion of articles over ranked
titles is mathematically regular rank by rank rather than being regular
only for groups. There is also no hint in the formula or its derivation as
to what kind of underlying probabilistic process creates this scattering.
Bradfords formulation also leaves unanswered questions for those
working with empirical data. How does one establish the size of the
core? What is the best value of a for any particular set of data (recognizing that, as above, no value of a fits the observations exactly)?These
questions are indicative of the gap that arises between empirical and
theoretical consideration of the phenomenon.
Work on clarifying and refining the theoretical statement of Bradfords law was undertaken by B.C. Vickery, M.G. Kendall,3F.F. h i m k ~ h l e r and
, ~ others. The most profound impact on the theoretical
foundation of Bradfordslaw has come from the efforts of B.C. B r o o k e ~ . ~
Brookes began with Bradfordsratios as portrayed above. Drawing
on the work of Vickery, he derived a formula which did not depend on
groupings of journal titles. The formula was this:
R(n) = k log (n)
where:

n is the rank of each journal

SUMMER

1981

43

M. CARL DRO'IT

In other words, the journal contributing the most articles


has a rank of 1, the second most productive title has a rank of 2,
and so on. In assigning ranks, every title is given a rank. In the
case of ties (titles contributing the same number of articles),
ranks are arbitrarily assigned to the tied journals.
R(n) is the total number of articles contributed by the first n
journals. The value of R(l) is simply the number of articles
contributed by the top title. The value of R(2) is the sum of the
number of articles contributed by the first journal plus the
articles contributed by the second-ranked title.
k is a constant which may be different for each search. It is related
to the document collection.
Note that this formula can be used to calculate the number of articles
contributed by a journal at any rank. For example, the number of
articles contributed by the fifth-ranked journal is simply R(5)- R(4)(the
total number of articles contributed by the first five titles minus the
number of articles contributed by the first four titles).
This formulation of Bradford's law allows us to use much greater
mathematical power in the search for an understanding of the theoretical aspects of the problem. One way of seeking this understanding is to
consider what the equation implies about the real world. If predictions
made from theory are obviously false, then we know that there is some
error. Either the theory must be changed, or there must be some restrictions included as to exactly what phenomenon is being described. Note
that the converse is not true. The fact that the theory does fit the world
does not actually prove the truth of the theory.
Brookes used the following approach in refining his formulation.
He considered the predictions which the formula made when the search
retrieved a very large number of articles. In such a situation, the formula
required the number of articles contributed by each of the top-ranked
journals to grow very large. However, we know that there must be a
limit to the number of articles on a topic which any single journal can
publish even if i t deals with nothing but the topic. Further, there are a
number of empirical studies which show that the number of articles
contributed by the top-ranked journals is not as high as the formula
would predict. Strictly speaking, the prediction from the formula is too
low for the first journal and too high for the remaining most-used
journals. In fact, for some data sets the formula predicts that the number
of articles contributed by the top-ranked titles will be negative.
In order to account for this disparity, Brookes modified the formula
to include another constant, s.

44

LIBRARY TRENDS

Bradfords Law
R(n) = k log (n/s)
He also imposed the limitation that this statement of Bradfords law
may not hold for the most frequently appearing titles in a data set. This
modification can be viewed as a speculation on the fundamental theoretical question. That question asks the underlying reason for the
observed regularity. This modification, in essence, says that the underlying process which creates the regularity may be different from the
process which causes the top-ranked titles to diverge from regularity. In
other words, the behavior of the top-ranked journals may present a
different theoretical problem than the pattern of the remaining titles.
There is another problem in accommodating the mathematical
form of Bradfords law to the observed data. In this case, the issue
involved those titles which contribute only a few articles (or a single
article) each. Empirical data show that there are not as many of these
little-used sources as the theory would predict. If the formula is correct,
then the total number of titles found must be exactly the value of k. In
practice, observed searches fall short of this number.
The data on little-used titles again raise a problem for theorists:
either to modify the statement of the law or to reject the empirical data.
Rejecting the data in this case means assuming that the observed
searches are incomplete. Realistically, however, many of the searches
are well and painstakingly done. It is hard to imagine how they could be
made more complete.
Theorists have chosen to accept the mathematical formula and
reject the empirical data. The reasons for this choice illustrate an
important aspect of the difference between theory and empiricism. The
important factor to theorists is that the mathematical form of Bradfords
law as stated above is very agreeable in a mathematical sense. In its
present form, Bradfords law can be related to other mathematical
models of dispersion. These models include the gamma, Poisson, and
binomial distributions. These other distributions have been extensively
studied. The scattering phenomena which these distributions have been
shown to describe seem related to bibliometric scattering. Thus, in
rejecting the empirical data, theorists are not saying that they believe
that searches are incomplete or that k truly predicts the true number of
titles that will be found. Theorists are instead saying that they believe
that the advancement of understanding lies in the study of certain
mathematical forms. The question of conformity to empirical data is
seen as less important in this situation.

SUMMER

1981

45

M. CARL DROTT

The decision not to alter the mathematical form of Bradfords law


has another advantage in the development of theory. The advantage lies
in the fact that the formula is still assumed to apply to the titles
contributing only a few articles. To a librarian, the journals which
contribute only an occasional article on a topic of interest are of much
less importance than those which regularly have many relevant articles.
Theoretical development requires a slightly different perspective.
Consider the way in which the literature on a new topic develops.
Initially, no journals have any articles on the subject. Then as the field
develops, some journals publish their first article. Of all the journals
that publish a first article on the subject, some fraction will publish a
second article. Similarly, those journals publishing any number of
articles are a fraction of those titles which published one fewer than that
number of articles. Viewed in this manner, the publication of a small
number of articles is a step toward publishing a greater number. This
line of reasoning makes it desirable not to exclude journals contributing
only one or two articles from the development of the theory. In a sense
such items are the base on which the distribution is built.
Brookes noted that in this progression, only those journals which
have succeeded in publishing at some level can have a chance of rising
above that level. Thus, since the competition diminishes, each remaining journal stands an even better chance of attracting articles. This kind
of success breeds success pattern was articulated by Derek de Solla
Price in his cumulative advantage model. This model has the possibility of adding to our theoretical understanding of Bradfords law. It also
offers a broader understanding of other related bibliometric distributions. Thus, in scope, this theoretical development goes beyond Bradfords law to a much broader class of probabilistic phenomena.
Empirical Development
The fundamental question in the empirical study of Bradfordslaw
is this: What are the implications of the observed pattern for the provision of user service? This involves two aspects: prediction and evaluation. Prediction could tell what titles would be useful or how users
would behave. Evaluation could provide a theoretical standard against
which retrieval or acquisition could be measured.
Empirical studies generally begin with a rank-frequency table. The
steps in the creation and interpretation of such a table have appeared
el~ewhere.~
Typically, such a table lists each rank, the number of articles
contributed by the journal of that rank, a cumulative frequency corres-

46

LIBRARY TRENDS

Bradfords Law
ponding to the variable R(n), and a cumulative percentage. From a n
empirical point of view, the cumulative percentage of articles is the
most important. T h e pattern is that a high percentage of the articles
comes from a very small number of journals. At this point any knowledgeable librarian can nod in agreement. Good practice dictates that the
most-used titles must be identifiedand their availability assured. O n the
other hand, there are a large number of titles with low usage. Only the
largest budget could justify holding them all. Yet, it is clear that access
must be provided.
T h e discussion above is better classed as conventional wisdom than
as exploitation of a natural law. The challenge (as yet unmet) of empirical studies is to find a way of using quantitative regularity to make
decisions which are more precise than simple intuition would provide.
Before we can say much about using Bradfords law, we must have
some way of knowing if a set of data conforms to the law. This immediately raises problems. In every kind of goodness-of-fit test we need to
have some source of predicted values against which to judge our data.
Thus, we must ask the question: What is Bradfords law? T h e usual
answer is that it is the formula for R(n) given earlier. But this is not
completely rational. As discussed above, the formula is known to be in
disagreement with empirical observation. Further, the formula
excludes the most-used titles, which in many actual situations may be
the most important. T h i s exclusion is complicated by the fact that
exactly how many titles are to be excluded is undefined. This number is
usually determined by the process of inspection, a rather arbitrary
procedure.
In spite of the problems, the formula given above is generally taken
as the source of expected values. This means that one must obtain values
for k and s, the two constants in the equation. These are obtained by
recognizing that if ideal data were plotted with one axis for R(n)
(cumulative articles) a n d the other for log (n) (log rank), the result
would be a straight line. T h e variable k and s represent the slope and
intercept, respectively, of that line. T h e usual process for obtaining
these values follows. First, the data are plotted OR semilogarithmic
graph paper. Next, a straight line is drawn through some central portion of the curve. This offers the investigator a n arbitrary choice as to
how much of the data to use and exactly what straight line best fits
those data. T h e value of the slope ( k )is determined for the line. This is
often done by using only two points, thus introducing further arbitrariness. T h e intercept (s)is obtained either by graphical extrapolation or
by using the slope and a point on the line.
SUMMER

1981

47

M. CARL DROTI

There is an alternate procedure to determine the constants. This


method uses linear regression on the data (or an arbitrarily selected part
of the data). This approach has the advantages of being more replicable
and of using more of the data. The disadvantage is that rank, a clearly
ordinal measure, is treated as if i t were on an interval scale. Such an
assumption is not unique to this application, but it must give the
thoughtful researcher reason to pause.
With the constants determined, expected values of R(n) can be
calculated for each rank. Next, a statistical test must be used tocompare
the observed and expected values. This raises another difficulty. On the
one hand, we know that because of the assumptions made, we do not
expect an exact fit. On the other hand, the ranking process imposes an
order on the data so that there will always be some degree of association
between R(n) and n.
The most frequently used test in this situation is the chi-square test.
This requires an arbitrary grouping in order to avoid cells with small
numbers. A greater problem is the tendency of chi-square to find significant differences whenever the sample size is large.8 This is a special
problem in this situation, since we know that some difference between
expected and observed must exist.
An alternative measure is Pearsons correlation. This measure of
variance reduction does not provide an answer as to whether a hypothesis should be accepted or rejected. Thus, the rigid arbitrariness of the
chi-square test is replaced with the arbitrary opinion of the investigator.
Correlation also suffers from the drawbacks of regression analysis on
which it is based. (Note that because the data are ranked, the test for the
significance of a correlation is meaningless.)
Some other measures to test for conformity to Bradfords law have
been proposed. The Kolmogorov-Smirnov test has been proposed as an
alternative to c h i - ~ q u a r eMore
. ~ experience with this test will be needed
before its worth can be evaluated. Another, more informal approach is
to calculate values of the intercept (s) for a number of observed data
points. Close agreement of these values is taken to indicate a Bradfordtype distribution.
The statistical problems of identifying a Bradford distribution are
compounded when comparing several sets of empirical data. In this
case, the question is not only the form of the distribution, but also
whether the distributions are the same. One problem is that the constants will produce a shift in the cumulative percentages for each rank.
The nature of this shift is complex because both the number of articles
and the number of titles are shifting. There seems to be no accepted
statistical test for this situation.

48

LIBRARY TRENDS

Bradford's Law
Even if the sample sizes are the same, it is still difficult todetermine
if two data sets should be considered identical within the limits of
sampling error. This problem hequently arises when samples are taken
in the same situation but at different times. Some of the variation in the
rankings of titles will be due to sampling error. But changes in rank may
also reflect real changes in the use of a title. The sample sizesrequired to
resolve this issue are very large indeed. For example, Brookes has calculated that to achieve a 95 percent confidence level that two adjacent titles
should not reverse their order, a sample size of several thousand-if the
titles are high (e.g., 5 or 6) in the ranking-isrequired." The resolution
of lower-ranked pairs requires much larger samples (tens or hundreds of
thousands). Consideration of these sample sizes should make any
researcher cautious in accepting the accuracy of empirical data.

The Gap Between


The title of this article alludes to a gap between theoretical studies
of Bradford's law and empirical research. The gap is this: none of the
variables which characterize the empirical situation have been shown to
relate to the theoretical model. These include variables which describe
the field or topic being researched, the way the search is conducted, the
specific needs of the user, or the characteristics of the collections
involved. This is a rather peculiar situation. Anyone with practical
experience in information retrieval recognizes that these parameters are
important in providing high-quality service. It is almost contraintuitive to find that none of these variables are reflected in the theoretical study of Bradford's law.
There is an important limitation to the gap described above. It is
well known that the size of the set of retrieved items (in terms of both
total articles and total journal titles) is related to the theoretical model.
The number of articles is strongly related to the slope (constant k in the
equation), and the number of titles is somewhat related to the intercept
(constant s). Thus, any aspect of the empirical situation which affects
these values will have a tie to the theoretical model. For example, the
generality or specificity of the topic (for a given field) may affect the
number of items retrieved. In such a case, the topic breadth will seem to
affect the model. In fact, this effectis related to a change in the number of
articles and titles, not to intellectual characteristics of the topic.
This relationship leads to some very odd conclusions for the
unwary investigator. For example, Pratt has proposed a measure of the
degree to which articles in a particular field are concentrated within the
literature." The claim is made that this index can be used with
SUMMER

1981

49

M. CARL DROTT

Bradford-type data. (The claim is actually made for Zipf-type data, a


mathematically identical distribution.) But Pratts index depends on
the number of titles in the sample. Consider two sets of data on exactly
the same topic: for example, Lawanis searches on tropical agriculture
for one year and four years.12The Pratt index, affected by sample size,
would lead to the conclusion that tropical agriculture is a more concentrated field than tropical agriculture.
A failure to recognize that data are subject to sampling error can
also produce meaningless applications of Bradfords law. For example, Goffman and Morris propose that circulation samples from a
journal collection be used to predict the distribution of use for the next
year.13They propose a one- to three-month sampling period and give an
example with a sample size of 876. They claim a core of eleven titles.
They do not actually make a prediction or test it. According to Brookes,
the appropriate sample size for this situation is about 25,000. Given the
huge undersampling proposed, the Goffmanand Morris study is better
classed as an application of common sense rather than any use of
Bradfords law.
Aside from the misuse of Bradfords law, the question arises as to
whether the gap between theory and practice is simply due to the fact
that more research findings are needed. This corresponds to the
hypothesis that empirical variables (those which characterize the intellectual dimensions of retrieval) can be incorporated into the theoretical
model. The alternate hypothesis is that the role of the empirical variables is only to define those situations for which the model can be
expected to hold. In this case, the empirical variables are constraints or
limits but not an actual part of the theoretical model. One area of
empirical data which may shed light on this gap is the behavior of the
most popular journal titles. In the discussion of theoretical development earlier in this paper, i t was noted that in some empirical situations
the most frequently occurring titles contribute fewer articles than would
be expected. A proposed interpretation of this divergence is that the top
journals become saturated with articles on the topic. This explanation seems very reasonable, but has never been substantiated.
If empirical variables such as the size, areas of specialization, and
editorial policies of the top journals have an effect, then it should be
possible to relate different levels of saturation to different empirical
circumstances. This would serve, finally, to tie the theoretical model to
empirical parameters.

50

LIBRARY TRENDS

Bradfords Law
Summary
The literature on Bradfords law presents the casual reader with a
number of pitfalls. The first problem is to distinguish theoretical from
empirical research. Theoretical work is aimed at understanding a random probabilistic process. To this end, assumptions are made which aid
mathematical manipulation. Empirical stddies concentrate on describing the world from a practitioners point of view. In these studies the
descriptive qualities of the data are more important than the statistical
aspects. A second problem is the large number of marginal claims in
the literature, that is, claims which are clearly speculative or are simply
unsupported. Some of this writing is not intended for acceptance without further study. Other articles are simply weak scholarship. In both
cases the reader must decide what to reject.
Between theory and empiricism lies a gap. This gap is the fact that
at present, the intellectual richness of real situations is not represented
in the mathematical austerity of the theoretical equations. It remains to
be seen if this gap can be bridged by further research.
Overall, Bradfords law represents an elusive phenomenon. On one
hand, it is easy to observe in real situations and can be represented with a
fairly simple mathematical formula. On the other hand, Bradford-type
data resist statistical testing, and the model fails to reveal the underlying
process which causes the distribution. In any case, the wise reader will
examine any study of Bradfords law closely before rushing to believe
more than is actually stated and supported.

References
1. Bradford, Samuel C. Sources of Information on Specific Subjects. Engineering
137(26 Jan. 1934):SS-SS;and
. Documentation. Washington, D.C.: Public
Affairs Press, 1950.
2. Vickery, B.C. Bradfords Law of Scattering. Journal of Documentation 4(Dec.
1948):198-203.
3. Kendall, M.G. The Bibliography of Operational Research. Operational
Research Quarterly 1l(March/June 1960):31-36.
4. Leimkuhler. Ferdinand F. The Bradford Distribution. Journal of Docurnentation 23(Sept. 1967):197-207.
5. Brookes, Bertram C. The Derivation and Application of the Bradford-Zip[
. Bradfords
Distribution. Journal of Documentation 24(Dec. 1968):247-65;
Law and the Bibliography of Science. Nature 224(6 Dec. 1969):953-56;and
Obsolescence of Special Library Periodicals: Sampling Errors and Utility Contours.
Journal of the ASZS 21(Sept.-Oct. 1970):320-29.
6. Price, Derek de Solla. A General Theory of Bibliometric and Other Cumulative
Advantage Processes. Journal of the ASZS 27(Sept.-Oct. 1976):292-306.

SUMMER

1981

51

M. CARL D R O l T

7. Drott, M. Carl, et al. Bradfords Law and Libraries: Present ApplicationsPotential Promised. ASLZB Proceedings 31(June 1979):296-304.
8. Mosteller, Frederick, and Wallace, David L. Inference and Disputed Authorship:
The Federalist. Reading, Mass: Addison-Wesley, 1961.
9. Brookes, Bertram C. Theory of the Bradford Law. journal of Documentation
33(Sept. 1977):180-209.
10. Ibid.
11. Pratt, Allan D. A Measure of Class Cmncentration in Bibliomeuics.Journal of
the ASZS 28(Sept. 1977):285-92.
12. Lawani, S.M. Periodical Lirerature of Tropical and Subtropical Agriculture.
Unesco Bulletin for Libraries %(March-April 1972):88-93;and
. Bradfords
Law and the Literature of Agriculture. Znternatzonal Library Review 5(July 1973):
34 1-50.
13. Goffman, William, and Morris, Thomas G. Bradfords Law Applied to the
Maintenance of Library Collections. In Introduction to Injormation Science, edited by
Tefko Saracrvic, pp. 200-03. New York: Bowker, 1970.

52

LIBRARY TRENDS

Empirical and Theoretical Bases of Zipfs Law


RONALD E. WYLLYS

Introduction
ONEOF THE MOST PUZZLING phenomena in bibliometrics-and, more
broadly, in quantitative linguistics-is Zipfslaw. Asonecommentator,
the statistician Gustav Herdan, has put it: Mathematicians believe in
[Zipfs law] because they think that linguists have established it to be a
linguistic law, and linguists believe in it because they, on their part,
think that mathematicians have established it to be a mathematical
law.
Let us start by considering a basic form of Zipfs law. Suppose one
has a natural-language corpus, e.g., a book written in English. Next,
suppose one makes a frequency count of the words in the corpus, i.e.,
counts the number of occurrences of the, and, of, etc. Finally, suppose
one arranges the words in decreasing order of frequency so that the most
frequent word has rank 1; the next most frequent, rank 2; and so on.
For example, a frequency count of the 75 word-types (i.e., dictionary entries) represented by the 142 word-tokens (i.e., distinct occurrences) in the two preceding paragraphs yields the partial results shown in
table 1. This set of rank-ordered frequency counts, though quite small
for the purpose, serves moderately well as an illustration of the fact that
rank and frequency have a surprisingly constrained relationship in
natural-language corpora. The values of the products of rank r and
frequency f fall in the relatively limited range 27-30 in the middle of
table 1 , and we may note that there was no a priori reason for us to expect
that the middle products rf would fall within so limited a range.
J

Ronald E. Wyllys is Associate Professor, Graduate School of Library and Information


Science, LJniversity of Texas at Austin.
SUMMER

1981

53

RONALD WYLLYS

TABLE 1
Rank r

Word-Type
the
in, of

a, one
law
and, it
suppose, that, Zipjs
(21 words)
(43 words)

1
2-3, rnean=2.5
4-5, meanz4.5
6
7-8, meanz7.5
9-11, meanzl0.0
12-32, rnean=22.0
33-75, rnear~54.0

Frequency f

Product ~f
9.0
17.5
27.0
30.0
30.0
30.0
44.0
54.0

The constrained relationship between the frequency of a word in a


corpus and its rank gained wide attention in the 1930s and 1940s
through the work of George Kingsley Zipf (1902-1950), a professor of
philology at Harvard University. The name Zipfs law has been given
to the following approximation of the rank-frequency relationship:
rf = c
(1)
where r is the rank of a word-type, f is the frequency of Occurrence of the
word-type, and c is a constant, dependent on the corpus (often around
one-tenth of the total size of &e., number of word-tokens in] the corpus).
When stated algebraically, Zipfs law is usuallygiven in the form of
equation ( l ) , but the law is probably most familiar in the graphic
representation of a mathematically equivalent form:
log r log f = log c
(2)
The dashed line in figure 1 illustrates what an idealized display of Zipfs
law in the form of equation (2) might be. More generally, analytic
geometry tells us that the equation of an arbitrary line whose slope is -B
can be written as:
B(1og r) log f = log c
(3)
One such line is pictured by the solid line in figure 1, which has a slope
of -0.92. (The relationship of this line to the data points will be discussed
later.) If we write equation (3) in a form like that of equation (l), we
have:

(4)
rBf = c
Note that if B takes on the particular value I , then equation (4) becomes
identical with equation (1). Thus, equation (4) is a generalization of
Zipfs law, and we shall refer to it as the generalized Zipfs law.

54

LIBRARY TRENDS

Zipfs Law
T

-3

LOG RRNK
Fig. 1. Observed Rank-Frequency Pairs for a Corpus of 21,354 Words
The solid line is the regression line for the data and has slope -0.92; the dashed
line has slope -1.0.
Source: Wyllys, Ronald E. The Measurement of Jargon Standardization in Scientific
Writing Using Rank-Frequency (Zipf)Curves. Ph.D. diss., University of WisconsinMadison, 1974.

It should be noted that Zipfs law only approximates the relationship between rank r and frequencyf for any actual corpus. Zipfswork
shows that the approximation is much better for the middle ranks than
for the very lowest and the very highest ranks, and his work with
samples of various sizes suggests that the corpus should consist of at
SUMMER

1981

55

RONALD WYLLYS

least 5000 words in order for the product rf to be reasonably constant,


even in the middle ranks.
If one performs a frequency count on a n actual corpus, arranges the
words in decreasing order of frequency, and draws the resulting pairs of
points by plotting the logarithm of rank on the horizontal axis and the
logarithm of frequency o n the vertical axis, the resulting points will
form a slightly curved line. Such plots are known as Zipf curves. An
example of a Zipf curve is shown in figure 1.
One can speak of the slope of a Zipf curve b y finding a straight
line that closely approximates the points of the curve and then taking
that straight lines slope as the slope of the curve. Apparently Zipf
himself fitted straight lines to hisdata by visual judgmentonly. Finding
their slopes to be ordinarily close to -1, he appears to have assumed that
the true slope of such curves was -1 and, hence, that equations (1) and
(2)-rather than the more general equations ( 3 ) and (4)-were correct.
This assumption is questionable, as will be discussed later.
T h e study of Zipfs law can be broken into threeareas: (1)the initial
discovery that equation (1) does approximate the relationship between
rank and frequency, (2) investigation of whether a better approximation
exists, and ( 3 ) attempts to provide a satisfactory rationale for the close
relationship of rank and frequency.

The Discovery of Zipfs Law


T h e work that led to Zipfs law started when Zipf was a graduate
student at Harvard in the 1920s. Studying phonetic changes in languages, he became interested in the frequency of use of phonemes as a
factor in their tendency to change phonetically over long periods of
time. From the relative frequencies of phonemes, he moved to studiesof
the relative frequencies of words, and in 1932publisheda book, Selected
. ~ the
Studies of the Principle of Relative Frequency in L a n g ~ a g eOf
approximately 125 pages in this book, over 100 are either diagrams or
lists of words and their frequencies. About 22pages are devoted to prose,
which includes this passage of justification:
Some have taken exception to the Principle of Relative Frequency
simply because it is statistical. For statistics are hateful to the human
mind; they are painfully definite for the group without being particularly definite for the individual. Undoubtedly, a primary law which
knows no fluctuation within itself is pleasanter. If nature had consulted man in the matter, we should all have suggested primary
laws....But nature didnotconsultus ...andhasseenfittolet thelawsof
chance govern vast portions of the basic order of the physical universe,
as well as no small amount of the biological?
56

LIBRARY TRENDS

Zipfs Law
It is interesting to note that, unfortunately, the critics of quantitative
analysis are still very much with us nearly fifty years later.
In his next book, The Psycho-Biology of Language, published in
1935, Zipf called attention for the first time to the phenomenon that has
come to bear his name. This book contained Zipfs first diagram of the
log(frequency)-v.-log(rank)
relationship, a Zipf curve for his count of
words in the Latin writings of Plautus.
Zipfs last book, Human Behavior and the Principle of Least Effort:
An Introduction to Human Ecology, appeared in 1949. As its title
indicates, this work is an exposition of what Zipf considered the fundamental reason for much of human behavior: the striving to minimize
effort. The diversity of phenomena to which Zipf was able to apply his
mathematical models, equations (1) and (2), is impressive.
Despite his strong defense of quantification, Zipf really did not
argue in quantitative terms. It is true that he performed counts of
linguistic phenomena, tabulated the counts, and displayed them. But
his mathematics were weak, and his energies were spent in philosophizing about the implications of his principles. Support for this comment
may be found in another passage from Selected Studies: Before returning to linguistic considerations, let me say here for the sake of any
mathematician who may plan to formulate the ensuing data more
exactly, the ability of the highly intense positive to become the highly
intense negative, in my opinion introduces the devil into the formula in
the form of [the square root of -13. And now to linguistics.*
Zipf appears to this writer to have been poorly trained for dealing
with quantitative phenomena. His knowledge of mathematics was
minimal; of statistics, apparently nonexistent. He never showed interest
in exploring the quantitative nature of his data beyond noting that they
came close to his model of the moment. This done, he would launch
into lengthy speculations about hazily defined possible causes. It is a
p i t y that he almost never collaborated with statisticians. On the other
hand, he was an indefatigable worker, and pursued the rank-frequency
phenomenon and related ideas for twenty years despite often harsh
criticism. There can be little doubt that the ubiquity of these phenomena would be less well recognized were i t not for his work.
Alternative Forms of Zipfs Law

In Human Behavior and the Principle of Least Effort, Zipf presented an interesting exception to his usual insistence that the slope of
linguistic Zipf curves is -1, i.e., that only equation (I), andnotequation
(4), applies to linguistic data. He noted that frequency counts of the
SUMMER

1981

57

RONALD WYLLYS

language of schizophrenics showed a different slope, commenting that


of all the rank-frequency data on words that have ever come to the
attention of the present writer, only those of [two schizophrenics] have
negative slopes...greater than unity.g Considering how poorly straight
lines of slope -1 fit most of Zipfs other examples, one wonders why he
found the departures of the schizophrenics slopes from -1 to be
remarkable.
In fact, the slopes of Zipf curves, when measured more carefully
than by Zipfs eye, turn out to be capable of considerable divergence
from -1. An obvious way of fitting a straight line to a Zipf curve, i.e., to a
set of pairs of observations of log(frequency) and log(rank)for a corpus,
is by linear regression, with log(rank) playing the role of the independent variable. A study by the present writer using this technique found
slopes ranging from -0.89 to -1.04 among only eight corpora. Figure 1,
taken from this study, shows a plot of log(frequency) u. log(rank) for a
corpus of 21,354 words from issues of thePsychologica1Review for 1969,
together with the regression line of best fit to these points. The regression line, shown as a solid line, has a slope of -0.92; for comparison,
figure 1 also shows a dashed line whose slope is -1.
In general, diagrams of the log(frequency)-u.-log(rank)
relationship for natural-language data typically show a downward concavity
for the low ranks. The full set of products rf typically shows a fairly
consistent slow rise in the values of rf as r increases, rather than any
readily identifiable constant value. Thus, equation (2) seems to represent actual data less accurately than does the generalized Zipfs law,
equation (4):
rBf = c
(4)
where B < 1. Note that if the product rf gradually increases with increasing r, the effect of giving r an exponent that is less than 1 will be to
make r B increase less rapidly than r, thus helping to keep the product
rB f more nearly constant. This will tend to hold the left-hand side of
equation (4) more or less in balance with the constant-valued righthand side.
For the reasons just sketched, it seems clear that one should not
expect equation (1) to be as satisfactory a description of Zipf curves for
actual data as is equation (4) with B expected to differ from 1ordinarily.
Benoit Mandelbrot has published several studies of generalizations of
Zipfs law, dealing both with the question of whether the slope is - 1 and
with the deeper problem of explaining why the rf products should be
relatively constant (his work on this latter problem will be discussed
later). Mandelbrot seized upon the idea that B could vary, and related B
58

LIBRARY TRENDS

to the diversity of a corpus (viz., the ratio of the number of word-types to


the number of word-tokens in the corpus), holding that B tended to vary
inversely with the diversity.
Mandelbrot also developed a further refinement of Zipfs law:
B
(r+m) f = c
(5)
where 7 is the rank of a word, f is its frequency, and m,B, and c are
constants dependent on the corpus.12The key idea in this version is that
m has its greatest effect when Y is small, and that equation (5) therefore
provides a better fit to typical data, especially to the low-rank, highfrequency words, than do equations (1) or (4).
An even more general formulation of the relationship of rank and
frequency is due to H.P. Edmundson, whose 3-parameter rank distrib u t i ~ n is:
~
f(r; c, b, a) = c(r 4- a r b c > 0, b > 0, a L 0
(6)
where f is the frequency associated with rank Y, and where a, b and c are
constants. Equation (6) contains Zipfs and Mandelbrots versions as
special cases.
The Search for a Rationale for Zipfs Law
W h y should there be such a surprisingly constrained relationship
between rank and frequency for natural-language corpora? The problem is more complicated than this question suggests. There are many
other phenomena that exhibit similar distributions; Abraham Bookstein has provided two unifying surveys of them.14Commenting on the
ubiquity of such distributions, Herbert Simon has mentioned distributions of scientists by number of papers published, ...of cities by population, ...of incomes by size, and ...of biological genera by number of
specie^."'^ He observed that one is led to the conjecture that if these
phenomena have any property in common it can only be a similarity in
the structure of the underlying probability mechanisms.16At present,
i t is probably fair to say that there is not yet complete agreement about
why these phenomena share similar distributions or why the distributions exhibit the behavior known as Zipfs law.
Zipf thought the reason lay in his Principle of Least Effort, which
he defined as follows:
The Principle of Least Effort means ...that a person ...will strive to
solve his problems in such a way as tominimize thetotal work that he
must expend in solving both his immediate problems and his proba-

ble future problems. That in turn means that the person will strive to
minimize the probable average rate of his work-expenditure (over
SUMMER

1981

59

RONALD WYLLYS

time). And in so doing he will be minimizing his effort, by our


definition of effort. Least effort, therefore, is a variant of least ~ 0 r k . l ~
(Italics in original.)

Unfortunately, Zipf never provided a clear logical development from


this principle to equation (1).
Intellectually much more satisfying than Zipfs principle is the
approach of Mandelbrot, who used ideas from information theory to
explain the rank-frequency phenomenon. The essence of Mandelbrots
contribution was his considering communication costs of words in
terms of the letters that spell the words and the spaces that.separate
them. This cost increases with the number of letters in a word and, by
extension, in a message. Mandelbrot showed that Zipfs law, equation
(l),follows as a first approximation from the minimization of communication costs in terms of letters and spaces. Linguistically, this amounts
to minimizing costs in terms of phonemes, which is why the phenomenon holds for both written and spoken language. Mandelbrots more
accurate second approximation has been shown in equation (5).
Many attempts have been made to provide other rationales for the
Zipf phenomenon. Most of them are probabilistic in their approach,
i.e., they consist of derivations, from various premises, of the probability
that a word will occur with a certain frequency in an arbitrary corpus.
The frequencies can, at least in concept, be ranked and thus be made to
imply probabilities that a certain rank r will be associated with a certain
frequency f; however, the implication may be difficult to make explicit.
In the space available here, only the nature of these attempts can be
sketched; the principal goal is to emphasize their variety and, hence, the
inconclusive current state of explanations of Zipfs law.
One such attempt involved the combined efforts of Herdan, J.O.
Irwin, and an eighteenth-century British mathematician, Edward
Waring. Herdan presents the model as:
x-a

Pf= X

for f=l

(7.1)

(7.2)

where pt is the probability that a word will appear with frequency! in a


large corpus, and a and x are constants, dependent on the corpus, such

60

LIBRARY TRENDS

Zipfs Law
that 0 < a < x. The function is due to Irwin,lg who discovered it in a
search for distributions useful in biology, and who credited Waring
with discovery of the basic inverse factorial expansion underlying the
probability function. Since it was Herdan who recognized that Irwins
result had linguistic applications, the function has come to be known as
the Waring-Herdan formula in linguistics. Several investigators have
reported that i t fits observed rank-frequency data well. Good fits to
observed rank-frequency data by another model, the lognormal distribution, have been reported by V. Belevitch and John B. Carroll.21
Bruce M. Hill and Michael WoodroofeZ3have pursued the derivation of a probabilistic form of Zipfs law by applying Bose-Einstein and
Maxwell-Boltzmann statistics to the classical occupancy problem. A
similar derivation has been offered by Yuji Ijiri and H.A. Simon.*
These papers employ various initial conditions to yield various of the
Zipf, Bradford and other related distributions. The interrelatedness of
these distributions has been shown by, inter alios, Bertram C. BrookesE
and Robert A. Fairthorne.26
A different starting point has been suggested by H.S. Sichel. He
assumes that each word in ...[an authors vocabulary has] a long-term
probability of o ~ c u r r e n c e . The
~ ~ mixing of thousands of such probabilities during the production of speech or writing can be expressed as a
compound Poisson probability, of which a number of known [distribution functions] such as the Poisson, negative binomial, geometric,
Fishers logarithmic, ...Yule, Good, Waring and Riemann distributions
are ...limiting forms.28Sichel reports very close fits of his model to some
twenty published frequency counts. A related paper by B.C. BrookesB
treats a model of a very mixed Poisson process, and another article by
Brookes and Jose M. Griffithsm derives from this process a frequencytransfer coefficient as a means of measuring the correlation of frequency and rank. Empirical tests of the theories are sufficiently rare that
reports of such tests by Beth Krevitt and Belver C. Griffith31and by Anita
Parunaka deserve mention.
The negative binomial distribution has been the starting point for
other investigations, including one by B.M. Hill treating the numberof-species problem but mentioning its relation to Zipfs law.% A major
effort along these lines is that of Derek de Solla Price, who has developed
a modification of the negative binomial that he calls the cumulative
advantage distribution (CAD). In the CAD the conditions of the negative binomial are modified so that success increases the chance of
further success, but unlike in the negative binomial: failure has no
subsequent effect in changing probabilities ....Failure does not constiSUMMER

1981

61

RONALD WYLLYS

tute an event as does success. Rather it must be accorded the status of a


non-event; thus lack of publication is a non-event and only publication becomes a markable event.1134
Rephrasing this for words rather
than publications, we can say that if at a certain point in writing a
corpus an author uses a given word, it seems plausible that the chance of
his or her using that word again in the corpus is increased, whereas the
authors failure to use some other word at that point says essentially
nothing about the chance that this other word will be used later in the
corpus. As a probability density function for the CAD, Price derives a
modified Beta function. Further comments on the CAD have been made
by Paul B. Kantor, Price and I.K. Ravichandra Rae.% Closely related is
the contagious Poisson process of Paul D. Allison.36
Conclusion
What is our present state of knowledge about Zipfs law? Its remarkable range of applicability to diverse phenomena continues to amaze us,
but we have come far along the road toward an understanding of why it
should exist and why it should be so widespread.
It seems intuitively plausible that some kind of general Poisson
process should underlie the pervasiveness of Zipfs law and its siblings,
such as the Bradford and Lotka laws discussed elsewhere in this issue.
After all, these laws deal with phenomena that we can characterize as
consisting of the occurrence of events whose individual probabilities are
ordinarily quite small and, hence, can be expected to behave in a
Poisson-like fashion. Even Zipfs hazy Principle of Least Effort can be
interpreted as a groping toward a Poisson process, in that the principle
suggests that people find i t easier to choose to use familiar, rather than
unfamiliar, words and that the probabilities of occurrence of familiar
words are therefore higher than those of less familiar ones.
On the other hand, it is clear that the process cannot be a pure
Poisson process, since the choices of words are not independent, as the
Poisson distribution requires. Already in 1955 Simon recognized this in
employing a stochastic model in which the probability that a particular word will be the next one written depends on what words have been
written previously.9937
Practically all the work on developing a rationale for Zipfs law has
involved probabilistic models related to the Poisson in some fashion.
Among these models is Prices cumulative advantage distribution,
which the present writer finds very persuasive. Research on a rationale
for Zipfs law has not yet achieved a consensus, but we are probably close
to one.

62

LIBRARY TRENDS

Zipfs Law
What implications does Zipfs law have for the design of information systems? The honest answer has to be few, if any. So far as vocabulary control is concerned, Zipfs law offers no useful information beyond
what frequency-counts alone can easily supply. The present writer has
suggested that different subject-fields may be characterized by different
slopes of Zipf curves,3 but again this possibility seems to have no
practical applications at present in information system design. Perhaps
such applications will develop in the future. Meanwhile, we can continue to surprise ourselves with the ubiquity of the Zipf phenomenon
and to enjoy the intellectual challenge of achieving a full, rational
understanding of it.

References
1. Herdan, Gustav. The Advanced Theory of Language as Choice and Chance.
Berlin: Springer-Verlag, 1966, p. 33.
2. See, for example, Zipl. George K. Human Behavior and the Principle of Least
Effort. Cambridge, Mass.: Addison-Wesley, 1949. Reprint ed., New York: Hafner, 1965.
3. Ibid., p. 291.
4.
.Selected Studiesof the Principle of Relative Frequency in Language.
Cambridge, Mass.: Harvard IJniversity Press, 1932.
5. Ibid.. p. 9.
. The Psycho-Biology oflanguage. Boston: Houghton Mifflin, 1935.
6.
Reprint ed., Cambridge, Mass.: MIT Press, 1965.
7.
, Human Behavior.
, Selected Studies, p. 21.
8.
, Human Behavior, pp. 295-96.
9.
10. Wyllys, Ronald E. The Measurement of Jargon Standardization in Scientific
Writing Using Rank-Frequency (Zipf)Curves. Ph.D. diss., University of WisconsinMadison, 1974.
11. Mandelbrot, Benoit. Structure formelle des textes et communication. Word
10(1954):1-27, 424-25.
. An Informational Theory of the Statistical Structure of Language.
12.
In Communication Theory: Papers Read at a Symposium on Applicationsof Communication Theory, edited by Willis Jackson, pp: 486-502. London: Butterworths, 1953.
13. Edmundson, Harold P. The Rank Hypothesis: A Statistical Relation between
Rank and Frequency. Technical report TR-186. College Park: Computer Science Center,
University of Maryland, 1972.
14. Bookstein, Abraham. The Bibliomeuic Distributions. Library Quarterly
46(0ct. 1976):416-23;and
. Explanations of the Bibliometric Laws. Collectzon Management S(Summer/Fall 1979):151-62.
15. Simon. Herbert A. On a Class of Skew Distribution Functions. Biornetrzka
42(Dec. 1955):425. Reprinted in Models of Man: Social and Rational. New York: Wiley,
. Some Further Notes on a Class of Skew Distribu1957, pp. 145-64.See also
tion Functions. Information and Confrol 3(March 1960):80-88.
16. Simon, On a Class of Skew Distribution Functions, p. 425.
17. Zipf, Human Behavior and the Principle of Least Effort, p. 1.
18. Herdan, Gustav. Quantitative Linguistics. London: Buttcrworths, 1964,
PP. 85-88.
SUMMER

1981

63

RONALD WYLLYS

19. Irwin, Joseph 0. The Place of Mathematics in Medical and Biological


Statistics. Journal of the Royal Statistical Society, Series 1 126(1962):1-41.
20. Belevitch, V. On the Statistical Laws of Linguistic Distributions. Annalesde la
Sociktk Scientifique de Bruxelles, Series I 73(18 Dec. 1959):310-26.
21. Carroll, John B. On Sampling from a Lognormal Model of Word-Frequency
Distributions. In Computational Analysis of Present-Day American English, edited by
Henry Kufera and W. Nelson Francis, pp. 406-24. Providence: Brown University Press,
1967.
22. Hill, Bruce M. Zipfs Law and Prior Distributions for the Composition of a
Population. Journal of the American Statistical Association 65(Sept. 1970):1220-32;and
. The Rank-Frequency Form of Zipfs Law. Journal of the American
Statistical Association 69(Dec. 1974):1017-26.
, and Woodroofe, Michael. Stronger Forms of Zipfs Law. Journal
23.
of the American Statistical Association 70(March 1975):212-19.
24 Ijiri, Yuji, and Simon, Herbert A. Some Distributions Associated with BoseEinstein Statistics. Proceedings of the National Academy of Sciences 72(May 1975):165457.
25. Brookes, Bertram C. The Derivation and Application of the Bradford-Zipf
Distribution. Journal of Documentation 24(Dec. 1968):247-65.
26. Fairthorne, Robert A. Empirical Hyperbolic Distributions (Bradford-ZipfMandelbrot) for Bibliometric Description and Prediction. Journal of Documentation
25(De~.1969):319-43.
27. Sichel, H.S. On a Distribution Law for Word Frequencies. Journal of the
American Statistical Association 70(Sept. 1975):543.
28. Ibid.
29. Brookes, Bertram C. Theory of the Bradford Law. Journal of Documentation
33(Sept. 1977):180-209.
, and Criffiths, Jose M. Frequency-Rank Distributions. Journal of
30.
the ASIS 29(Jan. 1978):5-13.
31. Krevitt, Beth, and Griffith, Belver C. A Comparison of Several Zipf-Type Distributions in their Goodness of Fit to Language Data. Journal of the ASIS 23(May-June
1972):220-21.
32. Parunak, Anita. Graphical Analysis 6f Ranked Counts (of Words). Journal of
the American Statistical Association 74(March 1979):25-30.
33. Hill, Bruce M. Posterior Moments of the Number of Species in a Finite Population and the Posterior Probability of Finding a New Species. Journal of the American
Statistical Association 74(Sept. 1979):668-73. (In this paper Hill mentions a relevant,
unpublished work: B e n , Wen-Chen. On Zipfs Law. Ph.D. diss., University of Michigan, 1978.)
34. Price, Derek de Solla. A General Theory of Bibliometric and Other Cumulative
Advantage Processes. Journal of the ASIS 27(Sept.-Oct. 1976):293.
35. Kantor, Paul B. A Note on Cumulative Advantage Distributions. Journal of
the ASIS 29(July 1978):201-04;Price, Derek de Solla. Cumulative Advantage Urn Games
Explained: A Reply to Kantor. Journal of the ASIS 29(July 1978):204-06;and Rao, I.K.
Ravichandra. The Distribution of Scientific Productivity and Social Change. Journal
of the ASIS 31(March 198O):lll-22.
36. Allison, Paul D. Estimation and Testing for a Markov Model of Reinforcement.
Sociological Methods dr Research 8(May 1980):434-53.
37. Simon, On a Class of Skew Distribution Functions, p. 427.
38. Wyllys, Measurement of Jargon Standardization.

64

LIBRARY TRENDS

General Bibliometric Models


JOHN J. HUBERT

Introduction
OVERTHE PAST fifty years, a sizable body of literature dealing with
bibliometric models has developed. The early models were proposed
because they were observed to fit graphically certain specific empirical
frequency distributions. In many cases their functional forms were
identical, the similarity only noted by other writers years later. In each
case, depending on the subject field they applied to, there was a proliferation of papers which modified, extended, clarified, applied, andgeneralized the initial model.
Almost all bibliometric models relate, in a simple functional form,
one variable with another variable. For example, in journal productivity studies, for a bibliography covering a certain span of years on a
particular subject, a few journals contribute a large number of articles,
other journals contribute fewer, and so on in a monotonic sequence
ending with a large number of journals contributing one articleeach to
the subject. The two variables are number of journals and number of
articles. After arranging the journals in a decreasing order of productivity, a frequency-sizedistribution is obtained for the number of journals
containing a fixed number of articles each. Conversely, a frequencyrank table can be constructed for the number of articles associated with a
journal of fixed rank. These two approaches to observed patterns form
the two modes of the data tabulations.

John J. Hubert is Associate Professor, Department of Mathematicsand Statistics,University of Guelph, Ontario.

SUMMER 1981

65

JOHN HUBERT

T o illustrate explicitly the notions of the frequency-size approach,


consider the following example. In table 1, f(n) denotes the number of
journals contributing exactly n articles each to a particular subject field
such that the total number of observed journals is J =Ef(n)and the total
number of observed articles is N = Cnf(n). This tabulation relates the
obserbations (the articles) with a class (a journal). The modeling problem is to find a mathematical equation relating f(n)with n. Associated
problems are: What is the process which generates this relationship?
What happens to the relationship i f a larger sample of observations, N,
is obtained? Does the relationship remain the same from year to year?

TABLE 1
DISTRIBUTION
OF THE NUMBER
OF
f(n) CONTRIBUTING
n ARTICLESEACH

FREQUENCY-SIZE

JOURNALS

f(n)

nf(ni

1
2
3
4
5
6
7

102
25
13
2
7

102
50
39

6
21
24
9
20
26
15
18
22
N = 395

8
9
10
13
15
18
22
Sum

3
3
1

2
2
1
1

1
J=164

8
55

Source: S.C. Bradford. Sources of Information on Specific Subjects, Engineering


137(1934):85-86.

In the last twenty-five years, i t has been observed that such tabulations occur for other pairs of variables from a wide variety of natural and
social phenomena. Table 2 provides some examples of such combinations of observation versus class relationship.
To understand the frequency-rank approach, consider the example
given in table 1. Near the bottom of the table there is one journal
contributing the most (twenty-two)articles. This journal is assigned the
rank 1. The next most productive journal is assigned rank 2 because it

66

LIBRARY TRENDS

General Bibliometric Models

TABLE 2
EXAMPLES
OF OBSERVATION-CLASS
RELATIONSHIP
0bseruation

Class

Number of articles
Number of citations
Number of insects
Length of word
Number of papers
Number of Occurrences
Checked-out frequency
Number of Occurrences
Length of sentence
Number of phonemes
Income level

journals

persons

species

words

authors

initial digits

books

nouns

sentences

words

persons

contributed eighteen articles. This is continued, resulting in the


frequency-rank distribution given in table 3, where g(r)is the number of
articles contributed by the journal of rank r . Notice that there are two
journals contributing thirteen papers each, and each is assigned rank.5,
the maximal-rank assignment method which is used in the case of
ties. (If we assign the rank 4 to each of these journals, then we are using a
minimum-rank method; there are also the random-rank and averagerank methods.) The frequency-rank tabulation reverses the order of the
frequency-size tabulation, and gives priority to the most productive
journals. The frequency-sizeapproach gives emphasis to the journals of
least productivity. There are other relationships between the two
approaches. Advantages and disadvantages of the frequency-rank
approach are discussed by Hubert and others.
For the examples given in table 2, the literature contains many
models, and some are erroneously referred to as laws as if they predicted Occurrences without error. From an analysis of these models, it
becomes apparent that some are for the frequency-size approach and
some are for the frequency-rank approach. The modeling problems
have different purposes, because from the data in table 1 the model can
be used to predict the number of journals contributing a fixed number
of articles, and from the data in table 3. the model can be used to predict
the number of articles contributed by a journal of a given rank. An
explanation of the list of all the different models which can be found to
be applicable to bibliometric phenomena, including the actual equation, the variables each relates to, the approach to obtain the equation,
and how they interrelate, would be extremely lengthy and beyond the
SUMMER

1981

67

JOHN HUBERT

TABLE 3
A FREQUENCY-RANK DISTRIBUTION
OF THE NUMBER
OF
ARTICLES
g (r) CONTRIBUTED
BY A JOURNAL OF RANKr

3
5
7
8
11
14
15
22
24

37
62
164

22
18
15
13
10
9

7
6
5
4
3

2
1

present scope and purpose of this article. However, each article in the
appendix to this paper contains a model which would be included in
this list because each adequately fits and models some form of tabulation. One word of caution is necessary: some of the models have been
declared as new and general, while others are self-declared and are
neither new nor general. There are survey articles on many of these
models, and some of these articles provide the mathematical equations,
historical developments, interrelationships, and examples of data sets
where the models have been useful.2
There are three models which are claimed to be general because
they possess two important properties: first, they include earlier models
as special cases; and second, they are applicable to a large class of
bibliometric variables. These are the models of Price, Bookstein and
Brookes. Bookstein especially has claimed that the major bibliometric
models-Bradford, Lotka and Zipf-are in fact a single law that seems
capable of describing phenomena in a vast variety of subject area^."^
The three models of Price, Bookstein and Brookes are discussed in the
following sections, with special attention to their derivations and to
their appropriateness as general models that can account for some of the
individual models mentioned above.

68

LIBRARY TRENDS

General Bibliometric Models


Analysis of the Price Model

The Price model4 is also known as the cumulative advantage distribution (CAD) and can be defined as follows: if f(nj is the fraction of
contributors having n articles each, then f(n) = (m + 1)B(n, m +2), for n
= 1, 2, ..., with the parameter m > 0, and B( e, 0 ) is the Beta function. The
Beta function is a name for a fundamental integral" involving two
parameters, and there is no simple verbal expression for this f ~ n c t i o n . ~
The CAD was proposed as a frequency-size type model because it yields
the relative frequency or proportion of authors each of whom has
produced a fixed number of articles on a specific area over a fixed period
of time. Over a finite range of observational values of n ,a distribution of
authors is obtained, and the model can be fitted so as to follow closely
the observed pattern. When the fit is statistically adequate it can be used,
for example, to predict the percentage of authors who have contributed
more than n papers each, and if n is large, this provides an estimate of
the set of so-called prolific authors on a subject area. Other important
uses such as in citation analysis have been illustrated by Price.
This model has as a rough approximation that f(n)is proportional
to ia,
where a > 0. This implies that as n increases, f(n)decreases, which
suggests that there are many authors having one paper each, and so on
in a decreasing fashion, with very few authors contributing many
papers. There is only one parameter in the model, and its value depends
on a particular data set. Price himself considers his model to be quite
general: "It provides a sound conceptual basis for such empirical laws as
the Lotka Distribution for Scientific Productivity, the Bradford Law for
Journal Use, the Pareto Law of Income Distribution, and the Zipf Law
for Literary Word Frequencies. It is therefore an underlying probability
mechanism of widespread application and versatility throughout the
social sciences.*16
How does one obtain such a model? The early attempts before 1950
by Yule, Pareto, Zipf, and Bradford were basedon plotting the data with
f(n) versus n , for example, then findinga mathematical equation which
would adequately represent the pattern observed in the particular discipline (Yule in biology, Pareto in economics, Zipf in linguistics, and
"The Beta function IS also known as Euler's first integral and is ddined as:
B(a,b) =

f'
0

x*-'( 1 .xf-' dx =

(a-l)!(b-l)!
(a&-])!

a>l,b>l,

where n! =n(n-])...:3.21, if n is an integer. Also, B(a,b)is approximately proportional


to a-bunder certain conditions.
SUMMER

1981

69

JOHN HUBERT

Bradford in journal productivity). In 1955,Simon derived the basic form


of the Price model, and proved it was a consequence of two assumpt i o n ~If. ~a collection of N articles is found on a specific subject area, and
if f(n)represents the number of journals containing n articles each, then
in this bibliometric framework, the two assumptions are: (1)the probability that the next article found in a journal which already has contributed n articles is proportional to nf(n), the total number of occurrences
of all articles from those journals which already have n articles each on
the subject area under study; and (2) there is a constant probability that
the next article found is from a new journal. These assumptions form
the basis of what is known as the stochastic birth or growth process.
Although the derivation by Simon is very rigorous and the statistical theory used is very advanced, i t does result in the same model
equation that Price proposed twenty years later. Simon also established
the models generality by showing that it contains: (1) the models of
Yule and Willis in biology, (2) the models of Zipf and Mandelbrot and
others in linguistics,8 (3) the models of Zipf in population growth, and
(4) the models of Pareto and Champernowne in income distributions.
The two assumptions of Simon are plausible, relatively simple and
satisfy many social processes; however, there is one drawback: they are
not unique, because other mechanisms can be shown to lead to the same
model equation. One of two other starting points is due to Simon
himself, and the other is, in fact, the Price starting point. These two
starting points will be considered separately.
Simons second mechanism, in journal productivity terminology,
is as follows: Suppose we have a collection of N articles dispersed among
J journals such that f(n)represents the number of journals contributing
n articles each. Furthermore, suppose articles are added to the collection
according to the two assumptions of the former growth process, and
articles are dropped from the collection in such a way that the sample
size N remains constant. Simon then proves that the same model equation involving the Beta function can be derived if we assume that if an
article from a particular journal is dropped, then all articles from that
journal are dropped, and the probability that the next journal dropped
be one contributing exactly n articles is proportional to f(n).This added
assumption will account for articles leaving or entering the collection,
i.e., the processes of emigration and immigration. It also can be used to
mimic changes in distributions due to different time periods but constant sample sizes.
The Price starting point which generates this model equation
involving the Beta function is a modification of the classical Polya

70

LIBRARY TRENDS

General Bibliometric Models

urn scheme. Suppose the contents of an urn containing two types of


colored balls depend upon what was selected in previous draws. If a ball
of the first color is drawn (called a success), two or more balls of that
Same color are replaced so that on the next draw there is an increased
chance of obtaining a ball of that color. The modification occurs when a
ball of the second color is drawn, in which case a single replacement of
that color is made so that in the next draw the chance of drawing this
second color is not increased. The net effect is that success increases the
chance of further success, whereas failure has no effect in changing the
chance of success or failure.
The success-breeds-successconcept has some empirical evidence to
support it, e g . , in the sociological theory of publishing characteristics,
in citation analysis, and in usage patterns from retrieval systems in
libraries, as well as in biological and epidemic processes. Therefore,
what Price has accomplished is to begin at a different starting point (the
urn scheme) and end at the same final model equation as Simon did,
who started with the birth process assumptions.
In summary, the Price model equation involving the Beta function
has the following properties: (1) it is a frequency-size model; (2) it has
the limiting form that f(n)is proportional ton-, for some constant a >
0; (3) it approximates several models in the literature; (4)it is the same as
the model proposed by Simon; and (5) it can be derived from three
different starting points, two due to Simon and one due to Price.
Therefore, although Prices theory underlying the model is sound and
new, the model equation and its ability to described bibliometric phenomena has been known since 1955. However, as a model equation it is
general because it satisfies our definition involving the two conditions:
it must model different variables, and it must contain or approximate
earlier models. It is interesting to note that the theory surrounding this
model equation is not entirely complete: The surface has only been
scratched and doubtless the application of this theory will raise more
empirical testing and rigorous statistical mathematics in expres~ion.~
Analysis of the Bookstein Model

In 1977 Bookstein proposed to find an expression for the expected


number of authors, f(n), in a discipline producing n articles over a
defined period of time, subject to sociological factors influencing productivity and other constraints. The factors used were societys need
for research and the use of rewards and threats for continued productivity. There were two constraints; the first was that Lotkas model be a
SUMMER

1981

71

JOHN HUBERT

special case. (Lotka's model is also known as the inverse-square law, and
essentially states that f(n) is proportional to l/n2, for n = 2,3,... .) The
second constraint is that if a publication distribution is observed over 1
time periods (e.g., t = 10 years), then the function f should satisfy the
relation f(tn)= f(t) X f(n).Bookstein calls this the "symmetry property"
or the "invariance property."" Bookstein claims that the only realistic
function satisfying these conditions and empirical data is f(n) proportional to l/n" where a is a positive number and estimable from the data.
(It is true that for this model equation we have Lotka's law when a = 2,
and furthermore, the symmetry property is satisfied since f(tn) = l/(tn)"
= (l/t")(l/n")= f(t)f(n).)It is also claimed that the model is the only one
which is unchanged whether the population of authors under study
remains the same, increases or decreases over time." This claim has not
been convincingly demonstrated.
There are four important observations which can be made about
this model:
1. The model equation is a special case of the model equation involving
the Beta function advocated by both Simon and Price. In fact, Bookstein recognizes this: "Simon's model and mine ...are not identical,
they converge at large n."13
2. The model equation is not the only possible equation satisfying his
two constraints.
3. The path to the model is different from the other paths discussed
earlier. In 1924 Yule used the empirical data fitting technique; in
1955 Simon used stochastic birth process assumptions; in 1976 Price
used the urn scheme mechanism; and in 1977 Bookstein used symmetry and other conditions to establish the model.
4. The model is not original. The form of the Bookstein model equation appears in earlier papers, as demonstrated in Fairthorne and
Hubert,14where we see that the very early models of Pareto, Zipf and
Stevens, and later Naranan15are exactly this model for the frequencysize tabulation. Hubert has proposed this same model equation for
the frequency-rank tabulation.16
The implication of the first observation is that the Bookstein model isa
special case of the model involving the Beta function. Therefore, in this
sense, the Bookstein model is less general. Also, since the model involving the Beta function fits many observable variables, because it is so
adjustable to a variety of shapes, and since the form nd is not as
adjustable, then, in this sense, the Bookstein model is less general. We
will return to the property of generality in a later section.
72

LIBRARY TRENDS

General Bibliometric Models


Analysis of the Brookes Model

In 1977 Brookes claimed to have proposed a model which is ...an


empirical law of social behaviour which pervades all social activities
and for which Bradfords law can be regarded as a particular example.
Also, Brookes believes in ...the wide generality of the Bradford law.
This section considers the models of both Bradford and Brookes since
they are apparently related.
In 1934 Bradford stated his famous model after examining how 395
articles on lubrication were dispersed among 164 different journals.
The actual data are given in table 4, where G(r) is the total number of
articles in the first r most productive journals. The Bradford model is
G(r)=a+blog(r),wherer=1,2,...andaand bareparametersdepending
on the subject area. When the cumulative totals of articles are plotted
against the logarithm of r an almost straight-line relationship results.
This approach gives priority to the most productive journals. When
tables 3 and 4 are compared, it is clear that the variable r is the same. This
is the reason the Bradford model is called a ranking type of model.
Brookes argues that this model can be used in other social contexts
whenever sources of an activity are ranked in order of decreasing activity. This approach of ranking is very important to Brookes: Ranking
by frequency is a technique widely used and understo od....
Ranking is
more primitive than measuring. We learn to rahk before we learn to
speak or count. It is because ranking is a primitive action which permeates all social activities that it is time it were taken more serio~sly.~
It is probably true that papers on bibliometric modeling refer more to
the Bradford model than to any other model. We will not digress further
on the Bradford model, but consider the Brookes model.
The structural form of the model proposed by Brookes is much
more complicated than the Bradford model: if g(r) is the number of
references in the rth most productive journal, then

j =r

where r = 1,2,..., m > 0 is a parameter, k is a quantity depending on rn,


and r! = r(r-l)(r-2)~
...3X2X1. Unfortunately, thisequation has n o simple
verbal or mathematical expression, but i t does possess several properties
which clarify its form:

1. The variable 7 acts as a rank because it is equivalent to the maximumrank assignment scheme mentioned earlier.
SUMMER

1981

73

JOHN HUBERT

TABLE 4

THEBRADFORD-TYPE
TABULATION
OF THE ACCUMULATED

NUMBER
OF REFERENCES
G(r) CONTAINED
IN THE
JOURNALS
FIRST7 MOSTPRODUCTIVE
Accumulated N o . of Journals
T

Accumulated No. of References

Gfr)

22

1
2

40

55

81

101

110

3
5

7
8

134

11
14
15

155

161

196

22
24
37

204

243

62
164

293

395

2. The mathematical properties are proper since the infinite series


converges, g(r) 0 as r 00 and g(1)2 g(2)2 ..., i.e., monotonicity.
3. The made1 relates the number of references, g(r), with the rank 7 ,
whereas the Bradford model relates the cumulative number of references, G(r)= & g(s),with the rank 7; that is, the Brookes model is a
frequency function and the Bradford is a distribution function.
4. When m is large and when we consider cumulative totals, the
Brookes model does conform to the Bradford model, i.e., C,=1 g(s)=
a + b log (r).
5. The model gives priority to the most productive journals because the
journals with only a few articles are in the tail of the frequency
function.
6. The model is based on the well-known Poisson discrete random variable which also possesses a countable infinte number of values.
7. The model is adjustable to a variety of shapes.
8. The model is entirely new, and its exact structure is not like any
other model.

Brookes calls his model the mixed Poisson model because the
derivation depends on a mix of Poisson random variables, In general
terms, the mix occurs as follows: for the sum XI + XZ + ...+ M we assume
74

LIBRARY TRENDS

General Bibliometric Models


not only that the Xs are independen t Poisson random variables, but also
that n , the number of variables, is a Poisson random variable. This is the
concept of random sum of random variables instead of a fixed sum of
random variables. More specifically, the underlying assumptions of the
Brookes model can be reduced to the following: (1) the number of
articles produced by a journal per unit time is a Poisson random
variable with mean rate of,e.g., 8; and (2) the total number of journals,
each producing at mean rate 8, is inversely proportional to 8. The
second assumption is consistent with the observation that as the rate of
production increases, the number of journals decreases, or the most
productive journals (lowest rank numbers) produce the greatest
numbers of articles. The derivation is therefore based on realistic
assumptions.
Another interesting consequence of Brookess model is his modifications of the Bradford model. Earlier, Brookes proposed a hybrid form
for the Bradford model to account for the nonlinearity at the beginning
of observed distributions.20He suggested the modified Bradford model:
B

G(r)

{ t;b

r = 1, 2, ..., c,

log r, r = c + 1, c + 2, ..., n.

Notice that for r = 1,2, ..., c the function is a curve, and for large values
the function is a straight line function of log r. To conform to Brookess
new model and other observed distributions, he now suggests two
hybrids, called Type I and T y p e 11, which he claims take the form:
logb [(a +

G(r)=

i ac-j)/a],

r = 1,2, ..., c

j =O

10gb [(a + r)/a], r = c + 1, c + 2, ..., n,

where b = (a+n)/a and LY < 1 for Type I and a > 1 for Type 11.
Graphically, these functions appear in figure 1, where hybrid Type I is
convex initially and hybrid T y p e I1is concave (with respect to the r-axis)
initially. The hybrids are consequences of his model and illustrate its
ability to adjust to anomalies.
In summary, the Brookes model is included in this article because
of its properties and its declared generality. To quote Brookes: The
main advantage of the model is that it shows how the log law, and
therefore how the hybrid forms of the Bradford law, can be derived in a
realistic and natural way from orthodox frequency statistics; and in
its present form it is the simplest possible stochastic model of the
Bradford law, but i t can easily be modified, for example, to embrace
SUMMER

1981

75

JOHN HUBERT

problems of growth and obsolescence-the classical birth and death


process of stochastic theory.l

Fig. 1. The Brookes hybrid types of Bradfords model


Source: Bertram C. Brookes. Theory of the Bradford Law. Journal of Documentation
3qSept. 1977):193.

The Validity of the Generalizations


Let us now return to the question of whether the models of Price,
Bookstein and Brookes are valid general models. It should be stressed
that the structural form of the Brookes model is new, but the Price and
Bookstein models are not new. We have shown that the Price model was
first proposed by Simon in 1955 and that the Bookstein model has been
proposed by many others.= However, we have explained how the
assumptions underlying the models are original and indeed helpful in
the understanding of the processes which could generate the models.
With respect to their generality, it has been demonstrated that all
three models possess the two properties of the original criterion, that is,
76

LIBRARY TRENDS

General Bibliometric Models

they include earlier models as special cases, and they are applicable to a
larger class of bibliometric variables. However, these general models are
limited in that they consider only the effect of one variable upon
another. Nature and life are not so simple. In fact, in bibliometrics,
recent articles have attempted to model one response variable as a
function of two or more variables. Also, on one source (journal, author,
etc.) more than one response variable has been measured. These two
approaches will change our definition of generality because such multivariate models will necessarily include the univariate models. It is a
simplistic viewpoint of reality to believe one variable in a social interactive process can be adequately predicted solely by one other variable. A
univariate model does not become more general by merely including
more parameters.
Examples of models of greater statistical sophistication can be
found: Bayesian models in interactive and retrieval systems,= methods
for evaluating article^?^ stochastic literature growth models,% modeling duration of book
measures of literature concentration using
the Whitworth model in frequency-rank distributions,n modeling relationships between title length and number of coauthors,= properties of
modeling,29and prediction models using time-series methods.%
This latest research differs from earlier work in bibliometrics in
that it uses models that are nonlinear and that consider the effect of
several variables, i.e., they are multivariate. These models require the
estimation of at least two parameters, whereas the simpler univariate
models required only one. The maximum likelihood method, the minimum chi-square method, and the ordinary linear least-squares method
have been used. However, estimation for nonlinear functions requires
care. If a model is linear and of the form Y = a + PX + e (where the
random variable e must have structure if confidence limits are to be
established), we speak of an additive model for the variable Y depending
on the variable X. If Y = a X p e , then this is an example of a multiplicative model. Taking logarithms on both sides, we have log Y = log a /3
log X + log t, which is of the form Y = a+ BX + e. We have linearized
the model where t = log e has a lognormal structure. For the nonlinear
model Y = ax 8+ t, taking logarithms yields log Y = log ( a X B + e), which
does not collapse into a linear form. This simple fact is often overlooked, and the estimation of parameters for such models requires
nonlinear estimation the01-y.~~
The use of multivariate models also requires greater care. If Y is
found to be functionally dependent on p variables XI, Xz,...,X,, suchas
Y = a +pIX1 p2& + ... ppXp+ 6 , then we have a multip!e regression

SUMMER

1981

77

JOHN HUBERT

model. If the response on a single subject is a set of variables Y1, ..., Y,,,
which may be correlated and are functionally dependent on a set of
variables XI, ..., X,, then we have a multivariate regression model. The
latter situation can utilize techniques such as cluster, factor and multivariate time-series analyses. Although recent articles in retrieval systems
are using time-series methodology, the simpler models listed earlier in
this article are not multivariate, and it should be possible to exploit
multivariate methods to achieve clarity and more generality.
Summary

The frequency-size and frequency-rank approaches, the two basic


approaches in a class of bibliometric models, have been explained. The
twenty-eight known models have been cited, and the three models due to
Price, Bookstein and Brookes have been analyzed by considering their
internal properties, interrelationships and generality. Because they
have a sound but different statistical foundation, they possess validity;
however, except for possibly Prices model, it is clear that the models are
not used in everyday prediction problems in library and information
science. Also, i t has been shown that the Price and Bookstein models are
not new. The three models are of limited generality because they are
univariate and simple. Examples of more sophisticated models have
been cited, and remarks have been made to suggest how greater generality can be achieved by using multivariate methods.32

References
1. Hubert, John J. Analysis of Data by a Rank-Frequency Model. Ph.D. diss.,
State University of New York at Buffalo, 1974; Brookes, Bertram C., and Griffiths, Jose M.
Frequency-Rank Distributions. JournaloftheASIS29(Jan. 1978):5-13;Hubert, John J.
Bibliometric Models for Journal Productivity. Social Indicators Research 4(0ct.
1977):441-73;and
. A Relationship Between Two Forms of Bradfords Law.
journal of the ASIS 29(Jan. 1978):159-61.
2. Simon, Herbert A. On a Class of Skew Distribution Functions. Biometrika
42(Dec. 1955):425-40; Brookes and Griffiths, Frequency-Rank Distributions; Price,
Derek de Solla. Little Science, Bag Science. New York: Columbia University Press, 1963;
Fairthorne, Robert A. Empirical Hyperbolic Distributions (Bradford-Zipf-Mandelbrot)
for Bibliometric Description and Prediction. Journal of Documentation 25(Dec.
1969):319-43;Brookes, Bertram C. Theory of the Bradford Law.Journal ojDocumentation 33(Sept. 1977):180-209; Bookstein, Abraham. The Bibliometric Distributions.
Library Quarterly 46(0ct. 1976):416-23; and Hubert, Bibliometric Models.
3. Bookstein, Abraham. Explanations of the Bibliometric Laws. Collection
Management S(Summer/Fall 1979):151-62.
4. Price, Derek de Solla. A General Theory of Bibliometric and Other Cumulative
Advantage Processes. Journal of the ASZS Z7(Sept.-Oct. 1976):292-506.

78

LIBRARY TRENDS

General Bibliometric Models


5. Ibid.
6. Ibid., pp. 292-93:
7. Simon, On a Class of Skew Distribution Functions.
8. See Hubert, John J. Linguistic Indicators. Social Indicators Research 8(June
1980):223-55.
9. Price, A General Theory. p. 304.
10. Bookstein, Abraham. Patterns of Scientific Productivity and Social Change: A
Discussion of Lotkas Law and Bibliometric Symmetry. Journal of the ASZS 28(July
1977):206-10.
, Explanations of the Bibliometric Laws, p. 159.
11. Ibid., p. 208; and
12.
, Patterns of Scientific Productivity, pp. 206-10.
13.
, Bibliometric Distributions, p. 422.
14. Fairthorne, Empirical Hyperbolic Distributions; and Hubert, Bibliometric
Models.
15. Naranan, S. Power Law Relations in Science Bibliography-A Self-Consistent
Interpretation. Journal of Documentation 27(June 1971):83-97.
16. Hubert, Analysis of Data.
17. Brookes, Theory of the Bradford Law, p. 180.
18. Bradford, S.C. Sources of Information on Specific Subjects. Engineering
137(26 Jan. 1934):85-86.
19. Brookes, Theory of the Bradford Law, p. 203.
20. Brookes, Bertram C. The Derivation and Application of the Bradford-Zip1Distribution. Journal of Documentation 24(Dec. 1968):247-65.
, Theory of the Bradford Law, pp. 185, 202.
21.
22. Fairthorne, Empirical Hyperbolic Distributions; and Hubert, Bibliometric
Models.
23. Tague, Jean M. A Bayesian Approach to Interactive Retrieval. Information
Storage and Retrieval 9(March 1973):12-42; Bookstein, Abraham, and Cooper, William.
A General Mathematical Model for Information Retrieval Systems.Library Quarterly
46(April 1976):153-67:and Inhaber, H. Canadian Scientific Journals: Part 11, Interaction. Journal of the ASIS 26(Srpt.-Oct. 1975):290-93.
24. Virgo, Julie A. A Statistical Procedure for Evaluating the Importance of Scientific Papers. Library Quarterly 47(0ct. 1977):415-30.
25. Braun, Tihor, et al. Literature Growth and Decay: An Activation Analysis
Rksumk. Analytical Chemistry 49(July 1977):682-88.
26. Cooper, Michael D., and Wolthausen, John. Misplacementof Books on Library
Shelves: A Mathematical Model. Library Quarterly 47( Jan. 1977):43-57.
27. Pratt, Allan D. A Measure of Class Concentration in Bibliometrics.Journal of
the ASIS 28(Sept. 1977):285-92;and Carpenter, Mark P. Similarity of Pratts Measure of
Class Cancentration to the Gini Index. Journal of the ASZS 30(March 1979):108-10.
28. Kuch, T.D.C. Relation of Title Length to Number of Authors in Journal
Articles. Journal of the ASIS 29(July 1978):ZOO-02.
29. Rouse, William B. Tutorial: Mathematical Modeling of Library Systems.
Journal of the ASZS 3O(March 1979):181-92.
30. Kang, Jong H., and Rouse, William B. Approaches to Forecasting Demands for
Library Network Services. Journal of the ASZS 31(July 1980):256-63.
31. See, f o r example, Wold, Herman. Nonlinear Estimation by Iterative Least
Squares Procedurrs. In Research Papers in Statistics, edited by Florence N. David, pp.
41 1-44. New York: Wiley & Sons, 1966.
32. Research for this paper was partially supported by NSERC Grant No. A9229.

SUMMER

1981

79

JOHN HUBERT

Appendix
Articles Containing Models of Bibliometric Phenomena
Benford, Frank. The Law of Anomalous Numbers. Proceedingsof the American Philosophical Society 78( 1938):551-72.
Bookstein, Abraham. Patterns of Scientific Productivity and Social Change: A Discussion of Lotkas Law and Bibliometric Symmetry. Journal of the ASIS 28(July
1977):206-10.
Bradford. S.C. Sources of Information on Specific Subjects. Engineering 137 (26 Jan.
1934):85-86.
Brookes, Bertram C. The Derivation and Application of the Bradford-Zipf Distribution. Journal of Documentation 24(1968):247-65.
, and Griffiths, J.M. Frequency-Rank Distributions. Journal of the ASIS
29( 1978):5-13.
Cole, P.F. A New Look at Reference Scattering. Journal of Documentation 18(June
1962):58-64.
Goffman,William, and Newill, Vaun A. Generalization of Epidemic Theory; An Application to the Transmission of Ideas. Nature 204(17 Oct. 1964):225-28.
Good, I.J. Distribution of Word Frequencies. Nature 179(16 March 1957):595.
. The Population Frequencies of Species and the Estimation of Population
Parameters. Biometrika 40(Dec. 1953):237-64.
H a m s , Bernard. Determining Bounds on Integrals with Application to Cataloging
Problems. Annals of Mathematical Statistics 3O(June 1959):521-48.
. Statistical Inference in the Classical Occupancy Problem Unbiased Estimation of the Number of Classes. Journal of the ASZS 63(Sept. 1968):837-47.
Herdan, Gustav. Type-Token Mathematics: A Textbook of Mathematical Linguistics.
The Hague: Mouton, 1960, pp. 182-85.
Hubert, John J. Analysis of Data by a Rank-Frequency Model. Ph.D. diss., Dept. of
Statistics, SUNY-Buffalo, 1974.
Kendall, Maurice G . Natural Law in the Social Sciences.Journal of the Royal Statistical Society, Series B 124(1961):1-16.
Leimkuhler, Ferdinand. The Bradford Distribution. Journal of Documentation
23(Sept. 1967):197-207.
Loth, A.J. The Frequency Distribution of Scientific Productivity. Journal of the
Washington Academy of Sciences 16(1926):317-23.
Naranan, S. Power Law Relations in Science Bibliography-A Self-consistent Interpretation. Journal of Documentation 27(June 1971):83-97.
Pareto, Vilfredo. Cours dfkonomie Politique. Lausanne: F. Rouge k Cie., 1896.See esp.
vol. 2, Sec. 3.
Plackett, R.L. The Truncated Poisson Distribution. Biometrics 9(Dec. 1953):485-88.
Price, Derek de Solla. A General Theory of Bibliometric and Other Cumulative Advantage Processes. Journal of the ASIS 27(Sept.-Oct. 1976):292-306.
Rao, I.K. Ravichandra. The Distribution of Scientific Productivity and Social Change.
Journal of the ASZS Sl(March 198O):lll-22.
Resnikoff, H.L., and Dolby, J.L. Access: A Study of Znformataon Storage and Retrieval
with Emphasis on Library Information Systems (Final Report HEW Proj. 8-0548,
1972).
Simon, Herbert A. On a Class of Skew Distribution Functions. Biometrika 42(Dec.
1955):425-40.
Stevens, S.S. On the Psychophysical Law. Psychology Review 64(1957):153-81.
Vickery, B.C. Bradfords Law of Scattering. Journal of Documentation 4(Dec. 1948):
198-203.

80

LIBRARY TRENDS

General Bibliometric Models


Willis, John C. Age and Area; A Study in Geographical Distribution and Origin of
Species. Cambridge, Eng.: University Press, 1922.
Yule, G . Udny. "A Mathematical Theory of Evolution, Based on the Conclusions of Dr.
John C. Willis, F.R.S." Philosophical Transactions of the Royal Society, Serzes B
2 13( 1924):21-87.
Zipf, George K. Human Behavior and the Principle of Leasf Effort. Cambridge, Mass.:
Addison-Wesley, 1949.

SUMMER

1981

81

This Page Intentionally Left Blank

Citation Analysis
LINDA C. SMITH

If I have seen farther, it is by standing on the shoulders of giants.


-Isaac Newton

Introduction

AN ESSENTIAL PART of research papers, particularly in the sciences, is the


list of references pointing to prior publications. As Ziman observes, a
scientific paper does not stand alone; it is embeddedin the literatureof
the subject. A reference is the acknowledgment that one document
giues to another; a citation is the acknowledgment that one document
receives from a n ~ t h e r In
. ~ general, a citation implies a relationship
between a part or the whole of the cited document and a part or the
whole of the citing d o ~ u m e n tCitation
.~
analysis is that area of bibliometrics which deals with the study of these relationships.
There are many published studies exploring citation analysis and
its applications. Some reviews of this literature have already a ~ p e a r e d , ~
and Hjerppe has compiled a bibliography of more than ZOO0 entries
including many studies in citation analysis. Eugene Garfieldswritings
are a rich source of information on this subject, particularly his book on
citation indexing and many of his Current Comments columns
reprinted from Current Contents.* The present paper does not attempt
to review this extensive literature in detail. Instead, it focuses on the
development of citation analysis as a research method, uses and abuses
of this method, and prospects for the future.
Linda C. Smith is Assistant Professor, Graduate School of Library and Information
Science, University of Illinois at Urbana-Champaign.
SUMMER

1981

83

LINDA SMITH

As noted above, a citation represents a relationship between the


cited and citing documents. The nature of this relationship is somewhat
difficult to characterize, however, due to the many reasons authors cite,
such as the fifteen enumerated by Garfield:
1. Paying homage to pioneers

2. Giving credit for related work (homage to peers)


3. Identifying methodology, equipment, etc.
4. Providing background reading

5. Correcting ones own work


6. Correcting the work of others
7. Criticizing previous work
8. Substantiating claims
9. Alerting to forthcoming work
10. Providing leads to poorly disseminated, poorly indexed, or uncited
work
11. Authenticating data and classes of fact-physical constants, etc.
12. Identifying original publications in which an idea or concept was

discussed

13. Identifying original publications or other work describing an epo-

nymic concept or term...

14. Disclaiming work or ideas of others (negative claims)


15. Disputing priority claims of others (negative h ~ r n a g e ) . ~

Bavelas suggests that the two extremesof this array of reasons might be
true scholarly impact at the one end (e.g., significant use of the cited
authors theory, paradigm, or method) and less-than-noble purposes at
the other (e.g., citing the journal editors work or plugging a friends
publications).0 Furthermore, it is possible that norms for citing vary
from discipline to discipline.
Just as there are a number of reasons why citations exist, there may
be a number of reasons why a citing author has not provided a link to
certain other documents. Although the most obvious reason is that a
prior document is not relevant to the present work, i t may also be due to
the fact that the author was not aware of the document, or could not
obtain it, or could not read the language in which it was published. As
Kochen observes: it is not surprising that there is a great deal of
arbitrariness in the way authors select references or their bibliographies. Undoubtedly, many documents which should have been citedare
missed; and many documents which the author does cite are only
slightly relevant.11
In spite of the uncertainties associated with the nature of the
citation relationship, citations are attractive subjects of study because
they are both unobtrusive and readily available. Unlike data obtained by
interview and questionnaire, citations are unobtrusive measures that do
84

LIBRARY TRENDS

Citation Analysis
not require the cooperation of a respondent and that do not themselves
contaminate the response (i.e., they are nonreactive).12Citations are
signposts left behind after information has been utilized and as such
provide data by which one may build pictures of user behavior without
ever confronting the user himself. Any set of documents containing
reference lists can provide the raw material for citation analysis, and
citation counts based on a given set of documents are precise and
objective.
Development of Citation Analysis
The development of citation analysis has been marked by the
invention of new techniques and measures, the exploitation of new
tools, and the study of different units of analysis. These trends have led
to a rapid growth in both the number and types of studies using citation
analysis.
The easiest technique to use is a citation count, determining how
many citations have been received by a given document or set of documents over a period of time from a particular set of citing documents.
When this count is applied to articlesappearing in a particular journal,
it can be refined by calculating the impact factor, the average number of
citations received by articles published in a journal during a specified
time period. This measure allows one to compare the impact of
journals which publish different numbers of articles. Pinski and Narin
have developed further refinements of citation counts which take into
account the length of papers, the prestige of the citing journal, and the
different referencing characteristics of different segments of the
li tera ture.13
Two techniques have been devised to identify documents likely to
be closely related: bibliographic coupling and cocitation ana1y~is.l~
Two documents are bibliographically coupled if their reference lists
share one or more of the same cited documents. T w o documents are
cocited when they are jointly cited in one or more subsequently published documents. Thus in cocitation earlier documents become linked
because they are later cited together; in bibliographic coupling later
documents become linked because they cite the same earlier documents.
The difference is that bibliographic coupling is an association intrinsic
to the documents (static), while cocitation is a linkage extrinsic to the
documents, and one that is valid only so long as they continue to be
cocited (dynamic).16The theory and practical applications of bibliographic coupling and cocitation analysis have been reviewed by Weinberg and Fkllardo, re~pective1y.l~
Citation counts and bibliographic
SUMMER

1981

85

LINDA SMITH

coupling were the characteristic citation analysis techniques in the


1960s, but in the 1970s cocitation analysis became the focus of much
research activity. Cocitation analysis is of particular interest as a means
for mapping scientific specialties.18
Use of new techniques in citation analysis has been made possible
by the availability of new tools. Early citation studies frequently were
based on lists of references found in articles appearing in a small
number of journals. Citations had to be transcribed and manipulated by
hand. Because of the tediousness of this process, most studies were
necessarily quite limited in scope. The availability of the computer has
significantly improved this situation in two ways: through the production of printed indexes which contain citation data from thousands of
document^,'^ and through the analysis of citation data available in
machine-readable form. Products of the Institute for Scientific Information (1%) now provide a wealth of data for citation analysis. Subject
coverage has been expanded from the initial Science Citation Index
(SCI) to include the Social Sciences Citation Index (SSCI)and the Arts
and Humanities Citation Index (AkHCI) as well. And with each passing year the time coverage becomes more extensive-SCI dates from
1961, SSCI from 1966, and A&HCI from 1976. In 1973, IS1 introduced
the Journal Citation Reports (JCR), a companion volume to the citation index which includes rankings of journals by citations and by
impact factor, as well as two ranked lists for each journal covered: those
journals which cite a given journal most heavily, and those journals
which a given journal most frequently cites.2oAt present, JCR volumes
are available for both SCI and SSCI.
Although discussion thus far has suggested counting citations only
for individual articles or journals, in fact various levels of aggregation
are possible. The units of analysis can be individual articles or books,
journals, authors, industrial organizations?1 academic departments,
universities, cities, states, nations, and even telescopes.22If one assumes
that citations are indicators of importance, then one can use such
analyses to determine the most important scholars, publications,
departments, etc., in a particular discipline or subdiscipline. This
assumption is just one of several which deserves closer scrutiny if the
results of citation analyses are to be understood.

Critique of Citation Analysis


Critics have questioned both the assumptions and methods of
many studies found in the citation analysis literature. The strongest
86

LIBRARY TRENDS

Citation Analysis
advocates of citation analysis recognize its limitations and exercise care
in its applications.23 Unfortunately, other investigators seem to be
Unaware of these limitations and misinterpret the results of theiranalyses. This section of the paper will enumerate both the assumptions
underlying citation analysis and the limitations of citation data, setting
the stage for the discussion of applications which follows.
Assumptions frequently underlying citation analysis are described
below, together with supporting evidence and/or counter-examples.
1. Citation of a document implies use of that document by the
citing author. This assumption actually has two parts: (1) the author
refers to all, or at least to the most important, documents used in the
preparation of his work; and (2) all documents listed were indeed used,
i.e., the author refers to a document only if that document has contributed to his work. Failure to meet these two conditions leads to sins of
omission and c o m m i s ~ i o n certain
: ~ ~ documents are underrated because
not all items used were cited, and other documents are overrated because
not all items cited were used. With respect to underrating, it should be
evident to anyone who has written a paper that citation does not
necessarily fullyand faithfullyreflect usage. Often whatiscitedisonlya
small percentage of what is read; not all that is read and found useful is
cited. Although the author usually does not provide any evidence of
omissions, there are exceptions. Consider a paper by Bottle which has as
its reference 29: Reference omitted to avoid embarrassing its author!25
With respect to overrating, Davies offers a fundamental law of reference giving: it is quite unnecessary to have read or even seen the
reference yourself before quoting it.26Without looking at the text of
both the citing and cited documents, i t may not be possible to make a
judgment as to whethera particularcitation doesindeed represent useof
material in the cited document.
2 . Citation of a document (author, journal, etc.) reflects the merit
(quality, significance, impact) of that document (author, journal, etc.).
The underlying assumption in the use of citation counts as quality
indicators is that thereis a high positivecorrelation between the number
of citations which a particular document (author, journal, etc.) receives
and the quality of that document (author, journal, e t ~ . )The
. ~ ~use of
citation analyses for evaluative purposes is the issue that has generated
the most discussion. While Bayer and Folger note that measures derived
from citation counts have high face validity,% Thorne argues that
citation counts have spurious validity because documents can be cited
for reasons irrelevant to their merit.29Nevertheless, this assumption has
been tested and has found support in a number of studies, including
SUMMER

1981

87

LINDA SMITH

studies of scientific papers, journals and scholars.30In each case some


nonbibliometric measure(s) of quality must be compared with bibliometric measures based on citation counts. The difficulty is that quality
is a complex attribute, and there generally is no single widely accepted
nonbibliometric measure. Furthermore, one cannot autorilatically
assume that an infrequently cited document (author, journal, etc.) i s
without merit. In the case of journals, for example, the usefulness of
citations as a measure of the journals quality varies according to the
function of the journal; news journals may be of high quality but
infrequently cited. Until more is understood about the reasons for
citing, citation counts can at best be viewed as a rough indicator of
quality. Small differencesin citation counts should not be interpreted as
significant, but large differences may be interpreted as reflections of
differences in quality and impact. Results of citation counts should be
compared with alternative quality indicators to look for correlations.
The validity of the measure is most fragile in citation counts for individual documents and authors. One can have more confidence in comparisons of counts based on larger units, such as journals.
3. Citations are made to the best possible works. One can better
understand the nature of citations if one knows the population from
which they are selected. If one assumes that citations are made to the best
possible works, then one must imagine that authors sift through all of
the possible documents that could be cited and carefully select those
judged best. But studies of science information use have suggested that
accessibility may be as important a factor as quality in the selection of an
information source. Soper conducted a study to investigate the effect of
physical accessibility upon the selection and use of reference^.^' She
found that the largest proportion of documents cited in authors recent
papers was located in personal collections, a smaller proportion was
located in libraries in departments and institutions to which respondents belonged, and the smallest proportion was located in libraries in
other cities and countries. Thus a paper might well have been cited
because i t happened to be on the citers desk rather than because it was
the ideal paper to cite. Accessibility of a document may be a function of
its form, place of origin, age, and language. If a journal article, its
accessibility may be determined by the journals circulation, reprint
policies, and coverage by indexing and abstracting services. Just as a
document may be more or less accessible, a researcher may be more or
less visible. An author is likely to be most aware of the work of his
colleagues. Other scientists work may come to the authors attention as
a result of their discoveries, their leadership in the scientificcommunity,
88

LIBRARY TRENDS

Citation Analysis

or their activities in the world of politics and contr~versy.~


As with
documents, researchers cited therefore do not necessarily represent the
most outstanding in a particular field. It may be that anything which
enhances a researchers visibility is likely to increase his citation rate,
irrespective of the intrinsic quality of his work.
4. A cited document is related in content to the citing document; if
two documents are bibliographically coupled, they are related in content; and if two documents are cocited, they are related in content. To the
extent that citation indexes can be used to retrieve relevant citing documents given a cited document, one has support for the first part of this
assumption. Additional support is found in the results of an experiment
conducted by Barlup in which authors were asked to assess the degree of
relatedness of citations to their own
The authors judged 72
percent to be definitely related, and only 5 percent to be definitely not
related. The difficulty with the second and third parts of the assumption
becomes evident when one considers an early statement by Garfield
regarding citation indexes: If one considers the book as the macro unit
of thought and the periodical article the microunit of thought, then the
citation index in some respects deals in the submicro or molecular unit
of t h ~ u g h t . Given
~ ~ this observation, Martyn contends that a bibliographic coupling is not a valid unit of measurement because one does
not know that two documents citing a third are citing the identical unit
of information in it.%Thus, bibliographic coupling is merely an indication of the existence of the probability (possibly zero)of a relationship
in the content of the two documents. The same applies to cocitation as
well; the fact that two papers are cocited does not guarantee a relationship between their contents.
5. All citations are equal. This paper began with a discussion of the
problematical nature of the relationship between cited and citingdocuments. Yet studies using citation counts generally assume that all citations (with the possible exception of self-citations) can be weighted
equally. In recent years many investigators have sought ways to refine
citation analysis which would not necessarily treat all citations to the
same article (author, journal, etc.) as equivalent. These can be subdivided into two types of refinements: mechanical v . intellectual. Mechanical refinements require no judgment or inference; intellectual
refinements require (at least at present) human analysis.
Mechanical refinements look at easily definable properties of a
citation, such as multiple Occurrence or location in a document. The
hope is that knowing this property will allow one to predict something
about the relationship between citing and cited documents. Bertram
SUMMER

1981

89

LINDA SMITH

investigated whether the level (or amount) 01material actually cited by


citing articles in science journals would vary significantly with the
section of the source article in which the citation occurs.36She identified
three levels [whole, part, word(s)]and three sections (title/introduction,
results/discussion, experimental), and found that indeed the title/introduction tended to cite whole articles, results/discussion tended to cite
only a part, and experimental tended to cite words. Thus, at least for the
articles in Bertrams study, a significant relationship doesexist between
citation level and the section of the citing article in which a citation
occurs. A study reported by Herlach tested and accepted the hypothesis
that the mention of a given reference more than once within the same
research paper indicates a close and useful relationship of citing tocited
paper? She further noted that use of multiple mention as a retrieval
criterion would yield good precision but low recall. Voos and Dagaev
agree that location and multiple mention can be used to distinguish
citations of particular value.%Self-citations are also readily identifiable
as a special class. Tagliacozzo completed a study todetermine theextent
to which authors of scientific articles cite their previous publications
and to find the principal distinguishing features of this particular type
of citation.39She found that self-citations were more recent than references to other authors. This suggests that conclusions about time distributions of citations would vary depending on whether or not
self-citations were included.
In contrast to mechanical refinements, intellectual refinements rely
on content analysis. As Small observes, in the last few years sociologists
of science have begun to explore the fine structure of citation practice by
examining the contexts in which citations occur-specifically the text
surrounding the footnote number.40 Many of these studies have
attempted to develop and apply classification schemes. An early classification scheme was that of Lipetz, who devised a set of indicators to
characterize the citing article as well as the kind of relationships of the
citing to the cited article.41Several other classification schemes have
been developed in the last few years.42Categories suggested by these
schemes include confirmative/negational-to distinguish material
judged to be g o d from material judged to be bad-and organic/perfunctory-to distinguish necessary citations from dispensable
ones. All these attempts at classification are useful supplements to
simple citation counts.
Rather than trying to create exhaustive classification schemes, a
more recent development is the interpretation of cited documents as
concept symbols. As Small observes, the interpretation of citations in

90

LIBRARY TRENDS

Citation Analysis

this way is more closely related to the way citations are used by authors
in scientific ~ a p e r s . He
4 ~ notes that most citations are the authors own
private symbols for certain ideas he uses. Where documents are frequently cited, their use as concept symbols may be shared by a group of
scientists. Small has recently extended this approach through the development of cocitation context analysis.44Statements characterizing the
structure of a cocitation map are obtained from an analysis of the
contexts or passages in which documents are cocited.
The difficulty with such intellectual refinements is the time
required to apply them. Human judgment is needed to analyze citation
contexts and make inferences, so studies employing intellectual refinements are likely to be limited in scope. Nevertheless, both mechanical
and intellectual refinements offer alternatives to treating citations as
masses of undifferentiated units. Although for some applications it is
sufficient to treat citations equally, for others it is appropriate to investigate the fine structure of citation practice.
Given the difficulties with the assumptions which underly many
citation analyses, one must also be aware of the problems which can
exist in sources of citation data. Some of these problems are characteristic of all sources of citation data, while others only pose difficulties in
the use of secondary sources, the citation indexes. Cole and Cole discuss
many of these problems and ways of handling them in statistical analys ~ s Problems
. ~ ~
include:
1. Multiple authorship. Cited articles listed in the citation indexes include only the first-named authors. To find all citations to publications of a given author, including those in which he is not firstauthor, one needs a bibliography of his works so that all articles can be
checked in the citation index. Errors can be introduced unless such
complete counts are made.& There is also the problem of allocating
credit in multiauthored works.47Should such works be treated the
same as single-authored works in citation counts or should credit be
divided proportionally? Should one consider the sequence of author
names in allocating credit, as this sequence often is an indication of
the contribution of each author to the work reported?
2. Self-citations. If self-citations are to be eliminated from citation
counts, this is easily done for papers written by a single author.
Again, multiauthored papers may require further checking. An even
more difficult problem is to eliminate group self-citations, i.e., references from any member(s)of a research group toany other member(s)
of that research group. In this case one would have to find a source
identifying all members of the research group.
SUMMER

1981

91

LINDA SMITH

3 . Homographs. Many scientists with the same nameand initialscould


be publishing in the same field. To differentiate among them, additional information such as institutional affiliation is needed. Otherwise citations could be attributed incorrectly to an author, particularly if he has a common name.
4. Synonyms. Citations will be scattered unless a standard form for the
author name can be established. Examples of synonyms in the
context of citation indexes include an authors name with a variable
number of initials (e.g., Licklider, J.; Licklider, J.C.; Licklider,
J.C.R.), a womans maiden and married names, different treatments
of foreign names, and misspellings. Although ISIs editing programs
manage to reconcile many of the differences introduced by citing
authors, variations still occur.48Journal names may also create synonym problems when the task is to identify citations of articles
appearing in a particular journal. In addition to variations in the
abbreviated form for a given title, journals merge, split into new
journals, change titles, and appear in translation. There is a need to
establish which forms are equivalent for the purposes of .citation
analysis.
5. Types of sources. The type(s)of sources used in a citation analysis can
influence the results, as demonstrated in a study by Line in the social
science^.^' Analyses of references drawn from journals and monographs showed differences, some of them large, in date distributions,
forms of material cited, subject self-citation and citations beyond the
social sciences, and countries of publication cited. Line concludes
that any citation analyses that are based on only a limited number
and type of sources without specific justification must be regarded
with suspicion. Oromaner notes that authors of any typeof literature
are advised to keep their audience in mind when writing, so materials
for different types of audiences may have differingcitation patterns.50
Citation data found in the citation indexes are drawn from many
journals and selected monographs which are international in scope
and from a variety of disciplines. Although the citation indexes do
not seriously suffer from limitations in number of sources, they are
limited in type. This is not a hindrance where journals within a field
give a complete and accurate reflection of all important aspects of
scholarship. Brittain and Line describe advantages and disadvantages of various sources of citations for analysis
Choice of
types and numbers of sources should depend on the purpose of the
analysis.
6. Implicit Citations.Most citation analyses consider only ex.plicitcitations, and these are what generally is made available in citation

92

LIBRARY TRENDS

Citation Analysis
indexes as well. An exception is the A&HCI, which includes implicit
citations when an article refers to and substantially discusses a work
but fails to include an explicit ~ i t a t i o n . ~But
' implicit citations are
also frequently found in the form of eponyms in the scientific literature. Furthermore, papers containing important ideas will not necessarily continue to be highly cited. Once an idea is sufficiently widely
known, citing the original version is unnecessary. If one were using
citation analysis to measure the impact of an individual author, such
implicit citations would fail to be included.
7 . Fluctuations with time. There may be large variations in citation
counts from one year to another, so citation data should not be too restricted in time.
8. Field variations. Citation rates (citations per publication) vary greatly
in different fields, leading to difficulties in cross-discipline comparisons. Bates has proposed the criterion rate as a refinement of citation
rate, because citation counts as a measure of the quality of a
researcher's work are influenced not only by the inherent value of
that work, but also by the size of the pool of available citers in a given
field.%A researcher's work can be evaluated in relation to a criterion
rate of citation, the citation rate of the top researchers in that field.
9. Errors. Of course, citation analyses, including those based on citation
indexes, can be no more accurate than the raw material used.
Although processing of citations for inclusion in citation indexes
may introduce some errors while eliminating others, many errors due
to citing authors remain. These can include errors in cited author
names, journal title, page, volume, and year. The incorrect citing of
sources is unfortunately far from uncommon. Two studies found the
percentage of error for citations from various journals to range from
10.7 to 50 percent.54
This section has considered two types of limitations which can
affect citation analyses: the assumptions made may not be true, and the
data collected may have inadequacies. Invalid conclusions will be made
unless these limitations are taken into account in the design of a study
and in the interpretation of results. The most reliable results may be
expected when citation abuses and errors appear as noise under conditions of high signal to noise ratio, i.e., the noise represents only a
relatively small number of the citations analyzed.55The limitations of
citation analysis do not negate its value as a research method when used
with care. There are, in fact, several application areas where citation
analysis has been used successfully.

SUMMER

1981

93

LINDA SMITH

Applications
The applications described in this section reflect two major
themes-use of citations as tools for the librarian and use of citations as
tools to analyze research activity. Citations and cocitations are part of
the range of empirical data available to historians and sociologists of
science, as well as to librarians. For each application area, representative
studies are mentioned to illustrate the types of questions which have
been investigated through citation analysis. In addition, weaknesses of
the method are identified, reflecting points made in the critique above.
1. Literature of studies. In this case one looks at citations in a
particular subject area to describe patterns of citation. The sources of
citation data may be as limited as a single journal in the field (e.g.,
#ens study of references in articles appearing in the Bulletin of the
Medical Library Association56),or they may encompass many sources,
including types of material in addition to journals. Characteristics of
cited materials frequently examined include types, age, highly cited
authors and journals, languages and countries of origin, and subject
d i s t r i b ~ t i o n s This
. ~ ~ type of study may also look for changes, in these
characteristics over time. A major problem with these studies is their
lack of compatibility which makes comparisons and synthesis difficult.
One application which has been suggested for this type of study is the
definition of appropriate secondary service coverage and scope of retrospective bibliographies in a given subject area.= By studying the range
of subjects, countries, languages, and document forms referred to by a
group of known core sources, one can begin to establish the boundaries
of a subject literature, with the limitation thatcitationsdonot reflect all
literature use. The value of this method in the determination of current
policies is a function of the extent to which these data can be projected
forward in time. Bibliographic coupling and cocitation have been used
to create mappings of the micro- and macrostructures and relationships
of discipline^.^^ Small, for example, has used cocitation analysis to
explore the relationship of information science to the social sciences.60
2.Type of literature studies. Citation analysis can be used to
gauge the dissemination of results reported in certain types of literature,
such as government documents, dissertations, or the exchange literature
of regional scientific societies.61The source of citations used for analysis
clearly can determine the generality of ones conclusions in this type of
study. Nelson, in a study of citations to art collection catalogs, remarks
that one must recognize the potential usefulness of what she terms
self-styled citation methodsa2In her case, citation analysis of the fine
arts nonserial literature was the appropriate approach. Such studies can
involve content analysis, documenting not only where but also how
certain types of literature have been used.
94
LIBRARY TRENDS

Citation Analysis

3 . User studies. Although studies in this category are descriptive,


they have implications for collection development and design of services. One approach is the analysis of reference lists in works written by
library users, e.g., term papers, theses/dissertations or technical reports,
in order to determine types of materials, age of materials, subject,
language, and whether locally owned.63An alternative approach is to
test a specific hypothesis about information use, e.g., scientific literature is little used by engineers, or academic researchers use different
information sources than practiti0ne1-s.~~
It should be noted that citation analysis can be used to compare user behavior today with user
behavior several years ago, with the understanding that citations donot
strictly parallel use.
4. Historical studies. Historical research using citation analysis is
based on a literary model of the scientific process.65 In this model
scientific work is represented by papers written and published to report
it, and relationships between discrete pieces of work are represented by
references in papers. Citations can be used to trace the chronology of
events, relationships among them, and their relative importance. Missing and implicit citations obviously pose problems for such an analysis. The subject of study may range from the influence of a single idea
(e.g., Smiths investigation of the influence of Vannevar Bushs memex
on subsequent research and development in information retrieval) to an
individuals entire scientific career (e.g., Ruffs study of Istvan
Kovacs).66Patent citation networks offer a novel technique for displaying the history of a technical
The changes in patterns of
cocitation from year to year can reveal something about the history of
ideas in a given specialty.mPatterns found through such an analysiscan
be validated through interviews with specialists and questionnaire surveys, as in Smalls longitudinal study of collagen re~earch.~
Finally,
cocitation context analysis has been proposed
as
a
means
for
elucidating
.
the structure of paradigms, the consensual structure of concepts in a
field.70
5 . Communication fiatterns. Citations can be thought of as plausible indicators of scientific communication patterns. Although citation
linkages do not necessarily reflect social contacts, it is probable that
there is a certain amount of congruence between documental and social
structures. Of particular interest is the analysis of these patterns to
identify problem areas in communication. These could include linguistic isolation, limited dissemination of new ideas, and barriers between
basic and applied science or between specialists and the public at large.
Shepherd and Goode, for example, sought to determine whether
research workers quoted in newspapers were really representative of
SUMMER

1981

95

LINDA SMITH

their respective fields.71 They examined whether authors quoted in


newspapers were also highly cited by their peers.
6. Eualuative bibliometrics. In these studies, citation analysis is
defined as the evaluation and interpretation of the citations received by
articles, scientists, universities, countries, and other aggregates of scientific activity, used as a measure of scientific influence and p r o d ~ c t i v i t y . ~ ~
Although there is much about the meaning of citation rates that is not
yet known (e.g., factors affecting rates, variation from field to field),
citation analysis is being used with increasing frequency as an evaluative tool by science administrator^.^^
7. Information retrieval. Use of citation relations has perhaps had
the greatest impact in information retrieval where citations have been
used to augment more traditional approaches to literature searching.
Experiments by Salton have confirmed that citations are useful supplements to keywords in identifying relevant documents.74Citation relations have been used in developing document representations, in
automatic classification, and in various retrieval algorithms which
make use of the ability to find like documents in the file independent
of words and language.75 Citations as a retrieval tool have the advantages that they are unaffected by changing terminology, they provide
access to interdisciplinary literature, and they reveal papers relevant to a
subject not found by using conventional indexes. Extensive use of
citations in computer-based retrieval has been hindered by a lack of
systems tailored specifically for citation manipulation. This may not
prove to be a barrier in the future, however. Yermish describes an
interactive information retrieval system which he developed to manipulate citation relations existing among bibliographic records effi~ i e n t l yEach
. ~ ~ document record has an associated REFLIST (list of all
documents that have been cited by a given document) and CITELIST
(list of all subsequent documents that cite a given document). These
allow one to use direct citation and citation coupling search modes in
addition to the more conventional keyword search. Two recent papers
describe the use of cocitation as a search strategy to retrieve documents
relevant to a given topic using commercially available search systems
and the citation index data bases.77Both cocited author and cocited
document searches are possible. Garfield has announced the pilot testing of BIOMED SEARCH, a retrieval system based on research front
specialties defined through cocitation clustering7 Finally, OConnor
has investigated procedures for the computer identification of citing
statements found in documents for which the full text is available in
machine-readable form, so thata retrieved set couldinclude not only the
identification of citing documents but also the citing statements them-

96

LIBRARY TRENDS

Citation Analysis
79

selves. As citation relations are more actively exploited for literature


search purposes, it should be possible to develop a better understanding
of the reasons for success and failure in this application area.
8. Collection development. It is appropriate tobegin the discussion
of citation analysis as a tool for collection development with Caylesss
observation that the main purpose of quantitative measures is to
provide information on which to base qualitative judgments, not to
replace them.s0 Citation analysis has been applied primarily to the
development of journal collections, where decisions to be made include:
to acquire or not acquire a particular title, to continue or discontinue a
subscription, to weed or not to weed a backset. Beginning with a study
by Gross and Gross published in 1927 which used citation frequency asa
measure of journal significance, citation analysis has been advocated as
a tool in journal evaluation. This application has not been without
critics. Brodman was perhaps the first to test the assumptions which
underly the method: (1) the value of a periodical to a professional
worker is in direct proportion to the number of times i t is cited in the
professional literature; (2) the journal(s) used as a source of citations
is(are)representative of the entire field; and (3)if more than one journal
is used as a source of citation data, all can be weighted equally? She did
not find support for these assumptions, and concluded that results of the
method should be used with caution. Others question journal rankings
by citation counts because such rankings may bear little relation to the
frequency of journal use in a particular library, as citation analysis and
use analysis measure different activities.m The difference in results of
use studies in different libraries suggests the limited value of a generalized technique such as citation analysis. In addition, there is the problem of noncited journals, such as trade and technical journals and
professional magazines.&9Line and Sandison discourage the use of
citation counts, instead advocating journal uses per unit of expenditure
(purchase, processing, binding, storage) as a basis for selection and
journal uses per unit of shelf space occupied as a basis for discarding.=
In spite of these criticisms, there is still a place for citation analysis
as a tool in collection development. Even though he disapproves of the
use of citation analyses in general, Line does acknowledge three uses to
which ranked lists derived from citation counts can be put: (1) highly
ranked journals not available locally and within subject scope are worth
examining in more detail; (2)low-ranked journals that are taken locally
should likewise be examined; and (3) lists based on source journals in a
particular subject can indicate journals outside of that subject which
may not yet have been acquired but may be valuable for local users.86In
SUMMER

1981

97

LINDA SMITH

his review of the applications of citation analysis to library collection


building, Broadus concludes that in the absence of highly expert subject
specialists on a library staff, citation studies can be of considerable value
in choosing serials and even mon~graphs.~
Given the uncertainties
involved in using citation counts in isolation, i t is appropriate to
consider their use in combination with other measures, as in the model
for journal selection which gives highest priority to journals found to be
highly cited, abstracted and used.B8Although a tool like JCR gives
citation rankings based on a large body of literature, librarians may also
analyze citations found in their users publications, as described above
under user studies. Kriz, for example, analyzed reference lists in
engineering theses. Finding books to be more frequently used than
journals, he shifted funds from journal subscriptions to purchase more
books. Citations are indicators of use, but there is probably a need for
multiple indicators, as demand does not strictly parallel citation. Many
materials are borrowed and read but not cited; authors who cite are only
a subset of the total reading public. Other measures of use such as
in-house use, circulation and interlibrary loan can be usedto supplement citation analysis in developing a more comprehensive view of user
needs as a basis for collection development.
Future Developments
Thus far this paper has described the uses, as well as abuses, of
citation analysis. Given the increasing availability of raw material for
citation analysis (asA&HCIjoins SSCI and SCI)and the development of
computer systems with which to manipulate these data easily, it is safe
to predict that citation analysis will continue to be a commonly used
technique. But the large number of studies using citation indexes has
led one critic to remark that uses of citation indexes other than for
literature searching seem to be examples of Kaplans law of the instrument: Give a small boy a hammer, and he will find that everything he
encounters needs pounding. 90 Superficially, citation analysis appears
to be a simple technique to apply, and there is a danger that it will fall
into disrepute through uncritical or overenthusiastic use. As with any
methodology, citation analysis produces results whose validity is highly
sensitive to the skill with which it is applied.
The critique of citation analysis in this paper outlined theassumptions often made and the problems which arise in data collection. In
order to better understand the possibilities and limitations of citation
analysis, more studies which test the assumptions and explore the
problem areas are needed. Another way to strengthen studies using

98

LIBRARY TRENDS

Citation Annlysis
citation analysis is to apply multiple methods in the study of a phenomenon, as in the coupling of citation analysis and contentanalysis. As no
research method is without bias, citation analysis should be supplemented by methods testing the same variables but having different
methodological weaknesses. For example, to investigate communication patterns among scientists, one could supplement citation data with
those obtained via interview or questionnaire.
Not enough is known about the citation behavior of authorswhy the author makes citations, why he makes his particular citations,
and how they reflect or do not reflect his actual research and use of the
literature. When more is learned about the actual norms and practices
involved, we will be in a better position to know whether (and it what
ways) it makes sense to use citation analysis in various application
areas.91 It would also be interesting to study in more detail the characteristics of documents which do not cite and/or are not cited, and to
identify characteristics of documents which can be used to predict
citednes~.~~
Advances in theory and practice have marked the development of
citation analysis, and researchers are likely to continue contributing in
both these areas. Gilbert, for example, has proposed a theory of citing
which views referencing as persuasion.93 In practice, simple citation
counts have been supplemented by bibliographic coupling, cocitation
analysis, evaluative bibliometrics, and cocitation context analysis. Garfield recently noted that one of the major methodological changes in his
studies in the near future will be to shift from counting citations to
counting authors influenced by.91
To conclude this paper, two questions affecting the future of citation analysis will be posed. Is i t possible that increased use of citation
analysis will cause a change in citation behavior? How will citation
behavior be affected by the increasedbse of electronic media for generation, storage and dissemination of information? Although both questions have already received some attention in the literature, the
responses to them are necessarily somewhat speculative.
It has been suggested that the very existence of citation indexes and
the growing abundance of citation analyses will likely have various
feedback influences on the writing and citing habits of future authors.%
Just as authors may title their papers more carefully to ensure their
retrievability through keyword indexes, authors could be motivated to
acknowledge their intellectual debts to prior documents accurately, lest
their papers go undetected by the user of a citation index. Thus this
paper is titled Citation Analysis rather than the more metaphorical
Standing on the Shoulders of Giants, and care has been taken to
SUMMER

1981

99

LINDA SMITH

reference accurately works by Garfield, Small and other key researchers


in citation analysis, as well as to include one self-citation. In an article
on the ethics of scientific publication, Price asserts that now that citations to previous work have become a valuable tool for literature
indexing, referees and editors should summarily reject bibliographies
that are either insufficient or padded.% Fears have been expressed
regarding the possibilities for abuse: [Ilt might create a bandwagon
effect whereby authors who wish their document to be used will cite, and
try to get cited by, the most popular documents. This would be an
aberration, a disease of the information ~ystem.~
Whether or not such feedback influences are felt, other changes are
likely tocome with the increased use of electronic media for information
handling. The first question which arises is the form of bibliographic
references for material available in machine-readable form. Proposals
have already been put forward for both data files and computer conference comments.98 Questions of quality control, accessibility and
authors permission must be addressed before the latter can be handled
as conventional publications. Whether the technological chhnges available to the next generation of researchers will undermine the role of the
paper in the process of scholarship remains to be seen. What is already
available are information facilities for electronic publishing and document handling such as the Xanadu Hypertext System.* The basic unit
of this service is the windowing document. With the full text of documents available in machine-readable form, a reader may either explore a
document or step through the window to explore the next document,
such as one referred to in a footnote. After exploring a further document,
the reader may return to the one that showed him to it, or proceed on
tangents that become available. Thus the links which citations represent are converted to electronic form, and new possibilities for citation
analysis arise. One can also imagine the use of graphics devices for the
display of citation networks and cluster maps.
This paper began with a quotation from Newton, the image of
science advancing by standing on the shoulders of giants. In fact: the
process by which the boundaries of knowledge are advanced, and the
structure of organized science is built, is a complex process
indeed....[T]he whole effort is highly unorganized. There are no direct
orders from architect or quarrymaster. Individuals and small bands
proceed about their businesses unimpeded and uncontrolled, digging
where they will, working over their material, and tucking i t into place
in the edifice.00Perhaps the greatestpotential contributionof citation
analysis lies in the new insights which it can offer into this process. It is
100

LIBRARY TRENDS

Citation Analysis

a process which concerns not only scientists and sociologists of science,


but also those who work with the literature of science.

References
1 . Isaac Newton. Quoted in Robert K. Merton. On the Shoulders of Giants: A
Shandean Postscript. New York: Free Press, 1965.
2. Ziman, John M. Public Knowledge: A n Essay Concerning the Social Dimension
of Science. Cambridge: Cambridge LJniversity Press, 1968, p. 58.
3. Narin, Francis. et al. Evaluative Bibliometrics: T h e Use of Publication and Citation Analysis in the Evaluation of Scientific Activity. Chemy Hill, N. J.: Computer
Horizons, Inc., 1976, pp. 334, 337. (PB 252 339)
4. Malin, Morton V. The Science Citation Index :A New Concept in Indexing.
Library Trends 16(Jan. 1968):376.
5. Gupta, B.M., and Nagpal. M.P.K. Citation Analysisand Its Applications: A Review. Herald of Library Science 18(Jan.-April 1979):86-93:Hall, Angela M. T h e Useand
Value of Citations: A State-of-the-Art Report. London: Information Service in Physics,
Electrotrchnology and Control, 1970 (R70/4); Hjerppe, Roland. An Outline of Bibliometrics and Citation Analysis (TRITA-LIB-6014). Stockholm: Royal Institute of Technology Library, 1978. (ED 167 077); Martyn, John. Citation Analysis. Journal of
Documentation 31(Dec. 1975):290-97; Miller, Elizabeth, and Truesdell, Eugenia. Citation Indexing: History and Applications. Drexel Library Quarterly 8(April 1972):15972; and Mitra, A.C. The Bibliographical Reference: A Review of Its Role. Annals of
Library Science and Documentation 17(.Sept.-Dec. 1970):117-23.
6. Hjerppe, Roland. A Bibliography of Bibliornetrics and Citation Indexing &
Analysis (TRITA-LIB-2013). Stockholm: Royal Institute of Technology Library, 1980.
7. Garfield, Eugene. Citation Indexing-Its Theory and Application in Science,
Technology, and Humanities. New York: Wiley, 1979.
. Essays of a n Znformation Scientist, 3 vols. Philadelphia: Institute
8.
for Scientific Information Press, 1977, 1980.
9.
. Can Citation Indexing Be Automated? In Statistical Association
Methods for Mechanized Documentation (NBS Misc. Pub. 269),edited by Mary E. Stevens,
et al., p. 189. Washington, D.C.: National Bureau of Standards, 1965.
10. Bavelas, Janet B. The Social Psychology of Citations. Canadian Psychological
Review lS(Apri1 1978):lO.
11. Kochen, Manfred. Principles of Information Retrieval. L o s Angeles: Melville,
1974, p. 74.
12. Webb, Eugene J.. et al. Unobtrusive Measures: Nonreactive Research in the
Soczal Sciences. Chicago: Rand McNally, 1966.
13. Pinski, Gabriel, and Narin, Francis. Citation Influence for Journal Aggregates
of Scientific Publications. Information Processing and Management 12(1976):297-312.
14. Kessler, M.M. An Experimental Study of Bibliographic Coupling Between
Technical Papers. ZEEE Transactions on Znformation Theory IT-9(Jan. 1963):49-51;
. Bibliographic Coupling Between Scientific Papers. American Docuand
mentation 14(Jan. 1963):lO-25.
15. Marshakova, I.V. ASystemof Document LinksConstmctedon the Basisof Citations (According to the Science Citation Index). Automatic Documentation and
Mathematical Linguistics 7( 1973):49-57. (English translation of article in Nauchno
Tekhnicheskaya Informatsiya Seriya 2, no. 6 , pp. 3-8, 1973); and Small, Henry. CoCitation in the Scientific Literature: A New Measure of the Relationship Between Two
Documents. Journal of the ASZS 24(July-Aug. 1973):265-69.
16. Garfield, Eugene, et al. Citation Data as Science Indicators. In Toward a
SUMMER

1981

101

LINDA SMITH

Metric of Science: The Advent of Science Indicators, edited by Yehuda Elkana, et al., p.
185. New York: Wiley, 1978.
17. Weinberg, Bella H. Bibliographic Coupling: A Review. Znformation Storage
and Retrieval 10(May-June 1974):189-96;and Bellardo, Trudi. The Use of Co-Citations
to Study Science. Library Research 2(Fall 1980):231-37.
18. Small, Henry, and Griffith, Belver C. The Structure of Scientific Literatures. I:
Identifying and Graphing Specialties. Science Studies 4(Jan 1974):17-40.
19. Weinstock, Melvin. Citation Indexes. In Encyclopedia of Library and Information Science, vol. 5, edited by Allen Kent, et al., pp. 16-40. New York: Marcel Dekker,
1971.
20. Garfield, Eugene. The New IS1 Journal Citation Reports Should Significantly
Affect the Future Course of Scientific Publication. In
, Essays, vol. 1, pp.
473-74.
21. Small, Henry, and Greenlee, Edwin. A Citation and Publication Analysisof U S .
Zndustrial Organimtions. Philadelphia: Institute for Scientific Information, 1979.
22. Abt, Helmut A. The Cost-Eftectivenessin Terms of Publications and Citations
of Various Optical Telescopesat the Kitt Peak National Observatory.Publicationsof the
Astronomical Society of the Pacific 92(June 1980):249-54.
23. Griffith, Belver C., et al. On the Useof Citations in Studying Scientific Achievements and Communication. Society for Social Studies of Science Newsletter P(Summer
1977):9-13; and Garfield, Citation Indexing, pp. 240-52.
24. Foskett, Anthony C. The Subject Approach to Information. 3d ed. Hamden,
Conn.: Linnet Books, 1977, p. 52.
25. Bottle, R.T. Information Obtainable from Analyses of Scientific Bibliographies. Library Trends 22(July 1973):71.
26. Davies, David. Citation Idiosyncrasies, letter to the editor. Nature 228(26 Dec.
1970):1356.
27. Edwards, Shirley A., and McCamey, Michael W. Measuring the Performance of
Researchers. Research Management 16(Jan. 1973):34-41.
28. Bayer, Alan E.,and Folger, John. Some Correlatesof a Citation Measureof Productivity in Science. Sociology of Education 39(Fall 1966):381.
29. Thorne, Frederick C. The Citation Index: Another Case of Spurious Validity.
Journal of Clinical Psychology 33(0ct. 1977):1157-61.
30. Virgo, Julie A. A Statistical Procedure for Evaluating the Importance of Scientific Papers. Library Quarterly 47(0ct. 1977):415-30;McAllister, Paul R., et al. Comparison of Peer and Citation Assessment of the Influence of Scientific Journals. Journal of
the ASIS 31(May 1980):147-52;and Smith, Richard, and Fiedler, Fred E. The Measwement of Scholarly Work: A Critical Review of the Literature. Educational Record
52(Summer 1971):225-32.
31. Soper, Mary E. Characteristics and Use of Personal Collections. Library Quarterly 46(Oct 1976):397-415.
32. Coodell, Rae. The Visible Scientists. Boston: Little, Brown, 1977, p. 4.
33. Barlup, Janet. Relevancy of Cited Articles in Citation Indexing.Bulletin of the
Medical Library Association 57(July 1969):260-63.
34. Garfield, Eugene. Citation Indexes for Science. Science 122(15 July 1955):108.
35. Martyn, John. Bibliogmphic Coupling. Journal of Documentation 20(Dec.
1964):236.
36. Bertram, Sheila J.K. The Relationship Between Intra-Document Citation Location and Citation Level. Ph.D. diss., University of Illinois at Urbana-Champaign. 1970.
37. Herlach, Geruud. Can Retrieval of Information From Citation Indexes Be Simplified? Journul of fhe ASZS 29(Nov. 1978):308-10.
38. Voos, Henry,and Dagaev, Katherine S. Are All Citations Equal? Or, Did We Op.
Cit. Your Zdcm? Journal of Academic Librarianship l(Jan. 1976):19-21.
39. Tagliacouo, Renata. Self-Citations in Scientific Literature. Journal of Documentation 33(Dec. 1977):251-65.

102

LIBRARY TRENDS

Citation Analysis
40. Small, Henry G. Cited Documents as Concept Symbols. Social Studies of
Science 8(Aug. 1978):?27.
41. Lipetz, B e n - h i . Improvement of the Selectivity of Citation Indexes to Science
Literature Through Inclusion of Citation Relationship Indicators. American Documentation 16(April 1965):81-90.
42. Chubin, Daryl E., and Moitra, Soumyo D. Content Analysis of References: Adjunct or Alternative to Citation Counting? SocialStudiesofScience5(Nov.
1975):423-41;
Frost, Carolyn 0.The Use of Citations in Literary Research A Preliminary Classification of Citation Functions.Library Quarterly 49(0ct. 1979):399-414;Moravcsik, Michael
J.: and Murugesan, Poovanalingam. Some Results on the Function and Quality of
Citations. Social Studies of Science 5(Feb. 1975):86-92;Murugesan, Poovanalingam, and
Moravcsik, Michael J. Variations of the Nature of Citation Measures with Journals and
Scientific Specialties. Journal of the ASIS 29(May 1978):141-47;Oppenheim, Charles,
and Renn, Susan P. Highly Cited Old Papers and the Reasons Why They Continue to he
Cited. Journal of the ASIS 29(Sept. 1978):225-31;and Spiegel-Rosing, Ina. Science
Studies: Bibliometric and Content Analysis. Social Studies of Science 7(Feb. 1977):97113.
43. Small, Cited Documents, p. 328.
44. Small, Henry G. Co-Citation Context Analysis. Proceedings of the ASIS Annual Meeting 16(1979):270-75.
45. Cole, Jonathan, and Cole, Stephen. Measuring the Quality of Sociological Research: Problems in the LJseof the Science Citation Index. American Sociologist 6(Feh.
1971 ):23-29.
46. Long, J. Scott, c t al. The Problcm of Junior-Authored Papers in Constructing
Citation Counts. Social Studies of Science 10(May 1980):127-43.
47. Lindsey, Duncan. Production and Citation Measures in the Sociology of
Science: The Problem of Multiple Authorship. Social Studies of Science 10(May
1980):145-62.
48. Garfield, Eugene. Whatsin a Surname? Current Contents 13(16Feb. 1981):5-9.
49. Line, Maurice B. The Influence of the Type of Sources Used on the Results of
Citation Analyses. Journal of Documentation 35(Dec. 1979):265-84.
50. Oromaner, Mark J . The Audienre as a Determinant of the Most Irnportant
Sociologists. American Sociologist 4(Nov. 1969):332-35.
51. Brittain, J. Michael, and Line, Maurice B. Sources of Citations and References
for Analysis Purposes: A Comparative Assessment. Journal ofhcumentation 29(March
1973):72-80.
52. Garfield, Eugene. Will ISIs Arts Q Humanities Citation Index Revolutionize
, ESSUYS,~ 0 13. , pp. 204-08.
Scholarship? In
53. Bates, Marcia J. A Criterion Citation Rate for Information Scientists.Proceedings of the ASZS Annual Meeting 17(1980):276-78.
54. Boyce, Bert R., and Banning, Carolyn S. Data Accuracy in Citation Studies.
R Q 18(Summer 1979):349-50;and Goodrich, June E., and Roland, Charles G. Accuracy
of Published Medical Reference Citations. Journal of Technical Writing and Communication 7(1977):15-19.
55. Cawkell, A.E. Citations as Sociological and Scientific Indicators-A Review.
In EURIM 11: A European Conference on the Application of Research in Information
Seruices and Libraries, edited by W.E. Batten, pp. 31-39. London: Aslib, 1977.
56. Chen, Ching-Chih. A Citation Analysis of the Bulletin of the Medical Library
Association. Bulletin of the Medical Library Association 65(April 1977):BO-92.
57. Friis, Th. The Use of Citation Analysis as a Research Technique and Its Implications for Libraries. South African Libraries 23(July 1955):12-15.
58. Nicholas, David, and Ritchie, Maureen. Literature and Biblzometrics. Hamden,
Conn.: Linnet Books, 1978.
59. Griffith, k l v e r C., et al. The Structure of Scientific Literatures. 11: Toward a
Macro- and Microstructure for Science. Science Studies 4(0ct. 1974):339-65.
SUMMER

1981

103

LINDA S M I T H

60. Small, Henry. The Relationship of Information Science to the Social Sciences:
A &-Citation Analysis. Information Processing and Management 17(1981):39-50.
61. Gwhlert, Robert. A Citation Analysis of International Organization: The Lise
of Government Documents. Government Publications Review 6( 1979):185-93;OConnor, Mary A. Dissemination and Use of Library Science Dissertations in the Periodicals
Indexed in the Social Sciences Citation Index.Ph.D. diss., Florida State [Jniversity, 1978;
and Gibson, Sarah S. Some Characteristics of the Exchange Literature of Regional
Scientific Societies. Library Research 2(Spring 1980-81):75-81.
62. Nelson, Diane M. Methods of Citation Analysis in the Fine Arts. Special Libraries 68(Nov. 1977):39@95.
63. Mancall, Jacqueline C., and Drott, M. Carl. Materials LJsed by High School
Students Preparing Independent Study Projects: A Bibliometric Approach. Library
Research 1(Fall 1979):223-36;Popovich, Charles J. The Characteristics of a Collection
for Research in BusinessIManagement. College & Research Libraries 39(March
1978):llO-17; and Hockings, E.F. Selection of Scientific Periodicals in a n Industrial
Research Library. Journal of the ASIS 25(March-April 1974):131-32.
64. Waldhart, Thomas J. Utility of Scientific Research: The Engineers Use of the
Products of Science. IEEE Transacfions on Professional Communicafion PC-17(June
1974):33-35;and Culnan, Mary J. An Analysis of the Information Usage Patterns of
Academics and Practitioners in the Computer Field. Information Processing and Management 14(1978):395-404.
65. GarfieId, Citation Indexing, p. 81.
66. Smith, Linda C. Memex as an Image of Potentiality in Information Retrieval
Research and Development. In Information Retrieval Research. London: Butterworths,
1981; and Ruff, Imre. Citation Analysis of a Scientific Career. Social Studies of Science
9(Feb. 1979):81-90.
67. Ellis, P., et al. Studies on Patent Citation Networks. Journal of Documenfation 34(March 1978):lZ-20.
68. Small, Henry G. Structural Dynamirs of Scientific Literature. International
Classification 3( 1976):67-74.
. A Co-Citation Model of a Scientific Specialty. Social Studies of
69.
Science 7(May 1977):189-66.
- A &-Citation
.
Context Analysis and the Structure of Paradigms.
70. ~
Journal of Documentation 36(Sept. 1980):183-96.
71. Shepherd, Robert G., and C d e . Erich. Scientists in the Popular Press.New
Scientist 76(24 Nov. 1977):482-84.
72. Narin, Evaluative Bibliometrics, p. 334.
73. Aaronson, Steve. The Footnotesof Science. Mosaic G(March/April 1975):22-27;
and Wade, Nicholas. Citation Analysis: A New Tool for Science Administrators.
Science 188(2 May 1975):429-32.
74. Salton, Gerard. Associative Document Retrieval Techniques Using Bibliographic Information. Journal of the ACM lO(0ct. 1963):440-57.
75. Gray, W.A., and Harley, A.J. Computer Assisted Indexing. Information
Storage and Retrieval ~ ( N o v1971):167-74;
.
Kwok, K.1,. The UseofTitleandCitedTitles
as Document Representation for Automatic Classification. Information Processing and
Management 11(1975):201-06;Price, Nancy, and Schiminovich, Samuel. A Clustering
Experiment. Information Storage and Retrieval 4(Aug. 1968):271-80; Schiminovich,
Samuel. Automatic Classification and Retrieval of Documents by Means of a Bibliographic Pattern Discovery Algorithm. Information Storage and Relrieual 6(May
1971):417-35;Bichteler, Julie, and Parsons, Ronald G. Document Retrieval by Means of
an Automatic Classification Algorithm for Citations. Information Storageand Retrieval
10(July/Aug. 1974):267-78; Birhteler, Julie, and Eaton, Edward A. Comparing Two
Algorithms for Document Retrieval Using Citation Links. Journal of the ASZS 28(July
1977):192-95;and
. The Combined Use of Bibliographic Coupling and
Cocitation for Document Retrieval. Journal of the ASIS 31( July 1980):278-82.

104

LIBRARY TRENDS

Citation Analysis
76. Yermish, Ira. A Citation-Based Interactive Associative Information Retrieval
System. Ph.D. diss., University of Pennsylvania, 1975.
77. Chapman, Janet, and Subramanyarn, K. Cocitation Search Strategy.
In National Online Meeting Proceedings-1981, compiled by Martha E. Williams and
Thomas H. Hogan, pp. 97-102. Medford, N.J.: Learned Information, 1981; and White,
Howard D. Cocited Author Retrieval Online: An Experiment with the Social Indicators
Literature. Journal of the ASIS 32(Jan. 1981):16-21.
78. Garfield, Eugene. ISIs On-line System Makes Searching So Easy Even a Scientist Can Do It. Current Contents 13(26 Jan. 1981):5-8.
79. OConnor, John. Citing Statements: Recognition by Computer and Use to
Improve Retrieval. Proceedings of the ASIS Annual Meeting 17(1980):177-79.
80. Cayless, C.F. Journal Ranking and Selection, letter to the editor.
Journal of Documentation 33(Sept 1977):243.
81. Gross, P.L.K., and Gross, E.M. College Libraries and Chemical Education.
Science 66(28 Oct. 1927):385-89;and Garfield, Eugene. Citation Analysis as a Tool in
Journal Evaluation. Science 178(3 Nov. 1972):471-79.
82. Brodman, Estelle. Choosing Physiology Journals. Bulletin of the Medical
Library Association 32(0ct. 1944):479-83.
83. Pritchard, Alan. Citation Analysis vs. Use Data, letter to the editor. Journal of
Documentation 36(Sept. 1980):268-69.
84. Singleton, Alan. Journal Ranking and Selection: A Review in Physics. Journal
of Documentation 32(Dw. 1976):258-89.
85. Line, Maurice B., and Sandison, Alexander. Practical Interpretation of
Citation and Library Use Studies. College dr Research Libraries 36(Sept. 1975):393-96.
86. Line, Maurice B. On the Irrelevance of Citation Analyses to Practical Librarianship. In EURIM I I , pp. 51-53; and
. Ranked Lists Based on
Citations and Library Uses as Indicators of Journal Usage in Individual Libraries.
Collection Management 2(Winter 1978):313-16.
87. Broadus, Robert N. The Applications of Citation Analyses to Library Collection Building. Advances in Librarianship 7( 1977):328.
88. Dhawan, S.M.. et al. Selection of Scientific Journals: A Model.
Journal of Documentation %(March 1980):24-32.
89. Kriz, Harry M. Subscriptions vs. Books in a Constant Dollar
Budget. College dr Research Libraries 39(March 1978):105-09.
90. See Bavelas, Janet B. Comments on BusssEvaluation of Canadian Psychology
Departments. Canadian Psychological Review 17(Oct.1976):303;and Kaplan, Abraham.
The Conduct of Inquiry: Methodology for Behavioral Science. San Francisco: Chandler,
1964, p. 28.
91. Kaplan, Norman. The Norms of Citation Behavior: Prolegomena to the
Footnote.American Documentation 16(July 1965):179-84.
92. Ghosh, Jata S.,and Neufeld, M. Lynne. Uncitednessof Articles in the Journal of
the American Chemical Society. Information Storage and Retrieval lO(Nov./Ikc.
1974):365-69; Ghosh, Jata S. Uncitedness of Articles in Nature, A Multidisciplinary
Scientific Journal. Information Processing and Management 1I( 1975):165-69;Garfield,
, Essays,
Eugene. Uncitedness 111-The Importance of Not Being Cited. In
vol. 1, pp. 413-14; and Kuch, T.D.C. Predicting the Citedness of Sientific Papers:
Objective Correlates of Citedness in the American Journal of Physzology.Proceedings of
the ASIS Annual Meeting 15(1978):185-87.
93. Gilbert, G. Nigel. Referencing as Persuasion. Social Studies of Science 7(Feb.
1977):113-22.
94. Garfield, Eugene. Is Information Retrieval in the Arts and Humanities
Inherently Different From That in Science? Library Quarterly 50(Jan. 1980):56.
95. Margolis, J. Citation Indexing and Evaluation of Scientific Papers.
Science 155(10 March 1967):1213-19.
96. Price, Derek J. de Solla. Ethics of Scientific Publication. Science
144(8 May 1964):655-57.
SUMMER

1981

105

LINDA SMITH

97. Kochen, Principles of Information Retrieval, p. 82.


98. Dodd, Sue A. Bibliographic References for Numeric Social Science
Data Files: Suggested Guidelines. journal of the ASIS SO(March 1979):77-82;and Crickman, Robin D. The Form and Implications of Bibliographic Citations to Computer
Conference Comments. Proceedings of the ASIS Annual Meeting 15( 1978):86-88.
99. Nelson, Theodor H. Computer Lib/Dream Machines. South Bend, Ind.:
Ted Nelson, 1980, p. DM 41.
100. Bush, Vannevar. The Builders. Technology Review 47(Jan. 1945):162.

106

LIBRARY TRENDS

Obsolescence
D. KAYE GAPEN
SIGRID P. MILNER

OBSOLESCENCE
HAS BEEN DEFINED by Line and Sandison as the decline
over time in validity or utility of information. This concept is of
obvious interest to information theoreticians who concern themselves
with the development, career and eventual death or incorporation of
particular kinds of information. But i t is also of interest to practical
librarians who administer growing collections in finite spaces. Such
librarians look to research on obsolescence to help them decide which
items to keep and which to store or discard in order to make room for
new acquisitions. Ideally for remote storage or discarding, research on
obsolescence would culminate in simple mathematical formulas which
could be applied with equal success to any and all libraries. Obsolescence research has produced many mathematical formulas, but unfortunately they have been neither simple nor universally applicable. The
best researchers are the ones who have admitted that obsolescence is a far
more complicated and more hypothetical concept than we have hoped.
Only that research which has been transmogrified into bibliofolklore-journals can be discarded after seven years, everyone
knows chemistry books become obsolete more slowly than physics
books-is simple, and it is generally incorrect as well, either inexpression or application.
The concept of obsolescence has itself suffered a decline in fashion
such as may be responsible for apparent obsolescence of information in
D. b y e Gapen is Dean, University Library, University of Alabama, Tuscaloosa, and
Sigrid P. Milner is Personnel Intern, Iowa State University Library, Ames.
SUMMER

1981

107

KAYE GAPEN

&

SIGRID MILNER

certain fields. Gosnells classic paper published in 1944 referred to


several earlier studies.2But in the two succeeding decades, relatively less
was written, perhaps, as Evans has suggested, because vigorous library
building made the subject less ~ o m p e l l i n gIn
. ~ the 1970s, however, and
certainly in the 1980s, tightening budgets have resulted in a resurgence
of interest in obsolescence, including the reprinting of Gosnells article
in 1978. Increased periodical costs have made it imperative to cancel
some subscriptions, and librarians have turned once again to obsolescence research in hopes that the concept can be employed to forecast
future use as well as to describe current or past use.
Review Articles
Two major state-of-the-art reviews summarize the research that had
been done on obsolescence prior to their publication. A two-part article
by Seymour was published in 1972.4She considered monographs and
serials separately since obsolescence is somewhat different in each case.
She pointed out that up to that time most of thearticles on obsolescence
had been written by Americans (just the opposite has been true in recent
years), and she saw the research as a response to two problems: the
publishing explosion and the concomitant lack of space. She argued
that obsolete material on the shelves is not in itself merely a neutral
factor, becoming negative only insofar as i t prevents display of more
useful information, but is a definite negative because it hinders the
search for relevant material. Taylor stated along the same lines that
obsolete material may cause a loss of confidence in the library by its
users, particularly undergraduates, since only the useless material is left
on the shelf while the relevant material circulate^.^ Unfortunately, this
statement assumes an absoluteness of value, that a set of books has the
same ranked usefulness to all researchers, when in fact different
researchers, and even the same researcher at different times during a
project, will rank the usefulness of particular books differently. In
addition, the alternative to having mostly less useful volumes on the
shelves would seem to be having mostly empty shelves, assuming the
number of volumes in circulation at any one time remains constant.
Most researchers, including undergraduates, would probably find some
book preferable to no book.
Trueswells calculations have shown that 99 percent of a librarys
circulation needs can be satisfied by less than half of most collections.6
But Seymour points out Trueswells underlying assumption that the
circulation requirements of users are prime concerns of the library. All
libraries may not wish to accept this basic assumption. And his statisti108

LIBRARY TRENDS

0bsolescence

cal results still leave working librarians with the problem of determining which individual volumes are not being used, a problem not
necessarily made easier by increasing automation of the circulation
system. But initially, the decisions of which volumes to store or discard
were made qualitatively by experts, either faculty members or specialist
librarians. Given the effect of storage upon use, the selections became a
self-fulfilling prophecy. Stored on the assumption that they would be
less used, they were less used-perhaps because of their uselesness,
perhaps because of the deterrent effect of their storage.
Some recent literature has attempted to reproduce the judgments of
experts through mechanical or formulaic means without paying too
much attention to the actual validity of the judgments. Fussler and
Simon, for example, found that by analyzing functions of past use,
publication date, and language, they could achieve almost unanimous
agreement with the faculty experts in chemistry and economics.' Past
use was an especially significant predictor of future use. But in English
literature and Germanic literature, there was great disagreement
between the experts' opinion and any of the functions. It is a little hard
to see why this is true, if in fact scientists use chiefly more recent material
which would have no past use, while scholars in the humanities use
chiefly older material with a much longer history of use; yet none of the
three factors was an accurate predictor of use. Seymour concluded that
although weeding by means of past circulation was most efficient, it was
also disproportionately most costly because of gathering the data and
changing the individual records. Weeding by publication date or age
was least efficient because some heavily used books were stored; yet
because of the ease of implementation, this method may be the most
cost-effective. A two-tiered system might become possible with such a
weeding program, and indeed might be informally put into effect by
alert pagers: the most frequently recalled stored volumes might be left in
a particular area or on a shelf more easily accessible than the general
storage area. It is unfortunate that academic libraries are not more
committed to continuous derivation of use data about their collections.
A great deal of such data could be easily gathered through the automated
circulation systems many universities now have, and would provide
practical grist for the theoretical mill. Unfortunately, too many automated systems were brought up without much concern for their research
possibilities.
In the second part of her article, Seymour pointed out that serials,
being a different format from monographs, also had a different useespecially greater in-house use. One of the biggest problems in the body
of literature about obsolescence is how to deal with in-house use. Some
SUMMER

1981

109

KAYE GAPEN

&

SIGRID MILNER

studies have shown that in-house use is similar to, but greater &an,
circulation. This finding will be discussed later, but even if we accept it
at face value here, it does not solve the problem for the many libraries
with noncirculating periodicals. The research has relied chiefly on
citation data to identify individual volumes or entire runs of journals
for relegation to storage. As Sandison has pointed out, citation data do
not refer to any particular library; therefore, they do not shed light on
local use patterns or local user populations. Studies by publication date,
language, number of libraries holding the serial, position on ranked
lists, and other functions demonstrate that past use is again the best
predictor of future use. Fussler and Simon have detected a family
quality in volumes of a serial.This means that the use patterns of the
entire serial set are alike, and the whole run should be stored or retained.
It is not clear how the effect, if any, of various kinds of special issuesthe annual bibliographic issue, for example, or a single-theme issuewas allowed for, or what effect reprinting and photocopying have on
journal use, Researchers have devised a half-life value for scientific
journal articles. As Seymour pointed out, it might better be termed the
median citation age, since it represents the point at which half of all the
citations to an article which are going to be made have been made. The
use of this figure is not immediately apparent, since one would not wish
to discard or store a volume which had half its useful life still ahead. No
judgment can be made as to whether the first half or the second half of
the citations is more valuable; only that the first half is likely to come
more quickly. Some researchers believe that all journals older than a
certain date should be stored, while others find storage of entire runs
better, particularly if subscriptions have been canceled.
A second review article, by Line and Sandison, strikes at the heart of
some easily made assumptions about obsoles~ence.~
They discuss a
number of reasons for changes in the use of literature over time. The
information which the literature contains may be invalid, or may be
valid but incorporated in or superseded by later work. Most interesting
of all is the case where information is valid but in a field of declining
interest or fashionableness. In each of these cases, the literature will
experience a decline in use. Much of the literature will still be of interest
to the historian of the field, even if it contains invalid information, but
use of the information qua information will decrease. In some cases, use
of literature can increase. For example, if the information was formerly
considered invalid but is later recognized as valid, if a lag in technology
or theory delays exploitation of valid information (as was the case with
movable type, for instance), or if the information is valid and in a field of
increasing interest or fashionableness, then in each of these cases the
110

LIBRARY TRENDS

0bsolescence
literature will experience an increase of use. Too many researchershave
ignored the interplay of these complex factors and settled for a simple
model of linear or exponential obsolescence.
A further theoretical problem which Line and Sandison brought
out is that although information and knowledge are recorded and
communicated in documents, the relationship between document use
and information validity is by no means a direct one. A document which
is difficult to obtain may be less used although the information is
potentially useful. They stated definitely that what has been considered
the law of obsolescence-decline of use over time-is in fact nothing
more than a hypothesis still to be tested. Apparent obsolescencemay be
due to a number of irrelevant factors. Literature can be used in two
different ways: for current awareness and for a basic search on some
particular topic. Obviously new literature, and perhaps especially new
journals of a particular type, will be used for both these purposes. Older
literature and archival journals will be usedchiefly in the second way.
This differentiation in type of use might account for part of the obsolescence curve. The growth of literature also could affect the results.
One way in which literature has grown is in the tremendous increase in
number of publications. So many more monographs and journals are
being published now that even if the percentage that was being used
were no greater, the absolute number would be many times greater.
Other possible factors are the increase in number of journal articles per
issue, length of article or monograph, number of footnote citations or
references per article or monograph. It appears that no researcher has
attempted to come up with a statistical corrective to any bias which these
factors might introduce. One study suggested that it would be possible
to subtract literature growth (discovered by counting articles) from
apparent increase in use of more recent literature, thus deriving actual
increase, but did not actually do such a computation. In any case,
merely counting articles would probably not result in a sophisticated
adjustment factor.
The relationship between citations or references and use is another
uncertainty. Thesis advisers have long been aware of the purely ceremonial reference, made to a venerable but unused source. Similarly,
some sources are actually used in the production of research articles but
are not cited because of editorial restrictions or unwillingness to indicate indebtedness to such a source. Some uses of current-awareness tools
may lead only indirectly or not at all to research results; yet who is to say
that published research is the only useto which information can be
validly put? Journals dealing with the teaching of a particular university subject might only rarely be cited i n core journals, but they might
S U M M E R 1981

111

KAYE CAPEN

8C

SIGRID MILNER

be read and acted upon by many. This, of course, gets at the fundamental question, What do we mean by use?
A final basic point raised by Sandison and Line is the often ignored
distinction between synchronous and diachronous use studies. Most
studies are synchronous, since diachronous ones are time-consuming
and difficult to do; but researchers have shown that synchronous and
diachronous results need not be the same, and that in certain cases they
are markedly different. Synchronous studies are those which compare
use at a particular time to the age of the items. They might, for instance,
plot the publication dates of all items charged out from a libraryduring
a particular period, even a lengthy period as was done in the University
of Pittsburgh study. Or they might analyze the publication dates of cited
sources for serial articles in a given year or years. Basically, such studies
look backward from a point in present time. But what we are interested
in for weeding is the use that individual titles will receive in the future.
Here a diachronous study is necessary, one which follows particular
books or articles through their useful life span. Ideally, a study like this
would trace an entire collection through its total uses, or rigorous
sampling methods could authenticate less comprehensive studies. In
practice, diachronous studies tend to be like the Fussler and Simon
study which compared the use of particular books in two five-year time
periods. A diachronous study looks forward from publication date to
the use a book will receive, and is therefore more reflective of the future
use of similar books. Diasynchronous studies would also be possible
which would compare two statistically related synchronous studies, but
such research has been rare. Line and Sandison warned that studies
based on the various citation sources must take into account fluctuations in coverage of the source, such as occurred with the first years of
Science Citation Index.

Other Articles
The research since these review articles has been based on three
chief sources of data: citation studies, use studies based on circulation,
and use studies based on reshelving statistics. Sandisons article on
physics journals used the same data as an earlier study by Chen.12The
raw data presented by Chen for the use of 138 physics journals at
Massachusetts Institute of Technology (MIT) showed a rapid decrease
in use as the journal aged, but she failed to allow for the relationship of
numbers of items used to numbers available for use, in this case, meters
of shelf space. This correction for density produces quite a different
picture revealing no decline in use. Of the ten most frequently used
112

LIBRARY TRENDS

0bsolescence
journals, eight conventional journals showed a peak use at twelve to
sixteen years, while two journals of advance publication peaked at six to
seven years. Further use data from the British Lending Library confirmed these findings, according to Sandison.13
In 1975, Sandison collaborated on an article with Line to point out
information needed before citation and library use studies would be of
practical help in librarie~.'~
They mentioned such things as the relative
size of journals, which they considered important enough to be made a
special project of some national library; uses per subscription cost; uses
per article; recalls per keyword; and so on. Only when citation and use
studies take these factors into account will they be of any use either to
librarians making decisions about journal subscriptions, discarding
and binding, or to information system designers selecting material to
scan and items to include in an information system.
Taylor, too, sought a practical solution, this time to weeding,
partly in response to the earlier Seymour arti~1e.l~
He discussed the
benefits and problems of a weeding program, suggesting (as mentioned
earlier) that obsolete material on the shelves can permanently discourage patrons. He compared subjective with objective criteria as the basis
for weeding decisions, and finally attempted to formulate a method for
identifying those periodical volumes which should be stored. The basis
for such a method could be reshelving data, citation data, photocopying
data, circulation data, or national loans data. The Newcastle research
revealed that a reshelving study nets only 20-25 percent of actual inhouse use; and that even with saturation propaganda concerning the
study to prevent user reshelving, i t was only possible to raise the level to
40 percent. His general formula was the 15/5 rule: a journal is a
candidate for storage if none of the last fifteen years of the journal has
circulated during the last five years. He excluded recent subscriptions
with fewer than five volumes received, and altered the rule somewhat for
titles in the humanities and discontinued titles. Nevertheless, this rule
should be of help to those libraries which circulate periodicals. It is
expressed in a fashion different enough so that it does not oversimplify
the complexity of obsolescence, although it offers some aid to weeders.
Bulick and his associates, in what was termed a historical
approach, used preliminary data from the University of Pittsburgh
study to analyze the use of materials acquired in 1969.16They found that
first-time use was greatest in the year of acquisition (1969),consistently
falling off after that until 1974, the last year for which data were
presented. By 1974,56 percent of the acquisitions had been used at least
once. There was a similar dropin number of times circulated, so that the
largest percentage of items (about 14 percent) circulated once each, and
SUMMER

1981

113

KAYE GAPEN

&

SIGRID MILNER

the smallest percentage (0.19 percent) circulated twenty-five times. It is


difficult to interpret these results, since we do not know the date of
publication of items, nor the processing lag time and other environmental factors at the specific locale-in this case, the Hillman Library at
Pittsburgh.
In 1977, one of the few studies of nonscientific journal literature
was published." Longyear found that journal articles in musicology do
not show an obsolescence pattern like scientific literature, and that even
articles seventy years or older are cited significantly. Further studies
should be done in other areas of the humanities and social sciences, and
an attempt made to discover whether there is any obsolescence pattern
for these fields at all.
Pan has argued that rank lists of journals based on citations can be
used as indications of library use.18 Line attacked this idea, and showed
that only a local-use study is of significant practical use in thedecisions
which librarians make.lg Typically, librarians are concerned with canceling subscriptions of the lesser-used journals, ones which are so far
down the list of ranked journals that their position is largely a matter of
chance because of a difference from other journals of only one or two
citations. Line's conclusion is that citation analyses and rank lists "can
be of great interest, and some value-but not to the practicing
librarian.
Hindle and Buckland have studied another research method-the
employing of circulation data to reflect use both in and outside the
library.21 The assumption has been made that circulation data are
indicative of total use; but for purposes of weeding, it is necessary to
show a title-by-title relationship of circulation and in-house use. Two
studies at the University of Chicago and Newcastle-upon-Tyne Polytechnic tended to show such a correlation. But the Newcastle study also
showed that the number of volumes used was apparently five times the
number left to be reshelved, which may cast doubt on some studies based
on reshelving data.A University of Lancasterstudy seemed to show that
books used in the library are also the ones which circulate as a class.
In-house use and circulation tend to vary directly, but these data reflect
usage, not demand. Usage and demand are identical only at zero and
diverge increasingly as demand increases. If a book is out seven or more
times a year, the researchers pointed out, the amount of time i t spends in
the library is reduced enough to make research results erratic, since
in-library use is dependent on what is on the shelves. Their conclusion
was that in-house use often fell perforce on "unpopular books." Their
article suggested that in most cases an easy research technique would be
114

LIBRARY TRENDS

0bsolescence
to compare circulation data with a random shelf-list sample and a desk
sample of those books left unshelved.
Gosnells 1944 article was reprinted in summer 1978, with an
editors note which observed that earlier studies on obsolescencehad not
been followed up. The editor stated that at the time he knew of no
library which continuously derived, reviewed and incorporated obsolescence data;22and we know of no such library at this time. Gosnell based
his study on the analysis of three book lists recommended for college
library acquisitions. He was able to demonstrate that newer and more
recent books were preferred by the makers of these lists, and postulated
the existence of an average book mortality which could be applied to
all books in general, as life insurance mortality tables apply to all
members of the population. He found that various subjects in the three
lists had an obsolescence rate of from 1.5 to 31.3, with the overall
averages being 8.1, 8.4 and 9.6. Gosnell then analyzed the holdings of
five college libraries and found generally lower obsolescencerates, i.e., a
greater percentage of older titles. This was particularly true in the
classics, where two libraries had a negative obsolescencerate, signifying
a preponderance of older material. An analysis of circulation at Hamilton College showed a much lower obsolescence rate, about 4.9 overall.
Gosnell suggested that these obsolescence ratings could be used for
accreditation purposes.23They might also have significance for departmental book budgets: a field with a lower obsolescence rate might be
able to get by with a smaller budget than a more rapidly obsolescing
field, or conversely, a book purchase in a field with lower obsolescence
might be more cost-effective since it could be used for a longer period.
Bronmo put greater emphasis on the importance of literature
expansion.% He called for diachronous studies which would prove or
disprove the possibility that apparent obsolescence is merely a function
of the growth of the literature. He studied the use of books on literary
criticism at the University Library of Tromso and found that for books
published after 1945, date of publication was not a significant predictor
of use. He admitted, however, that his results would probably not apply
to other libraries, although he theorized that more significant works in
literary criticism had been published between 1950and 1954. His studies
excluded any books which he believed to be noncirculating because no
one lectured on those authors or wrote a thesis about them during the
year of his research. His conclusion was that bibliometric studies very
seldom have any immediate results.25

SUMMER

1981

115

KAYE GAPEN

&

SIGRID MILNER

University of Pittsburgh Study

Perhaps the most famous recent study of obsolescence has been the
Kent study at the University of Pittsburgh.% The purpose of the study
was to develop measures for determining the extent to which library
materials are used and what the costs are, to improve acquisitions
decisions, and to determine storage or discarding points at which alternatives to local ownership of various items became feasible. The
research was carried on over a period of seven years from 1968 to 1975
and was based chiefly on circulation statistics, in-house use sampling,
and journal use sampling at six science libraries. They found that 39.8
percent of the books acquired in 1969 did not circulate by 1975.Of those
that did circulate, 72.76 percent were borrowed during the year of
acquisition or the following year. The circulating items represented 75
percent of the titles used in-house, 99.6 percent of the outgoing interlibrary loans, and 98.1 percent of the reserve collection. They determined
that 54.2 percent of the 1969purchases should not have been made if two
uses were considered cost-effective; 62.5 percent, if three uses. Unfortunately, most libraries have not yet determined how many uses of a book
are cost-effective. The Pittsburgh reshelving study found that 24.86
percent of books used in-house had never circulated and 43 percent did
not circulate within the sample time period or within the year following
the sample period. The researchers concluded that 75-78 percent of the
in-house books did circulate externally and, therefore, that external
circulation data provided a sufficiently accurate reflection of use.
Journals at the six science libraries generally had low use, except in
the physics library, where the librarian had aggressive marketing
techniques. Interestingly, photocopying of journals increased 13 percent after the first two years following publication, and increased a
further 11 percent after fifteen years. The proposed weeding rule derived
from all these data stated that an item should not be weeded before it is
seven years old, and only items which have not circulated should be
weeded after the age of seven.
Summary

Much basic research remains to be done on obsolescence.


Researchers have taken the concept as proven, but in fact i t is still only a
hypothesis. The studies that have been done have concentrated heavily
on scientific fields at the expense of the social sciences and the humanities, and on journal articles at the expense of monographs. More should
be done in the humanities, if only todetermine whether obsolescence is
116

LIBRARY TRENDS

0bsolescence
a concept which cannot be usefully applied outside of the sciences.
Published articles need to be more informative about methodology, not
just giving results. In many cases, it is impossible to discover if the
reserve and reference collections are included in or excluded from the
percentages, an apparently small factor which could have a disproportionately large effect on the results. We need to consider what is meant
by use, and whether we can assign different values todifferent uses by
different populations, or whether we believe (or prefer to act as if we
believe) that all uses are equal. Should discarding be adjusted for irregularities in the curriculum, as Bronmo did when he excluded literary
criticism not circulating because no professor lectured on those authors
during that year? If no, the library may respond drastically to temporary
valuations. If yes, the library may be failing to respond quickly enough
LO shifts in research fields. Many studies have been motivated by a need
to discard something and have been interested only in what should be
discarded, not in an ideally objective research model. This paper has
already indicated the problems of differentiating between synchronous
and diachronous studies, and the greater usefulness, as well as difficulty,
of the latter, It has been assumed that circulation reflects in-house uses
as well, but that may be inaccurate. Kent stated that 75 percent of the
titles used in-house had circulated during the sample period;27 this
leaves one in four of the in-house uses not reflected in circulation.
Hindle and Buckland noted that the number of nonrecorded in-house
uses in a study at Newcastle-upon-Tyne Polytechnic Library was twenty
times the number of recorded uses.% They also found that reshelving
nets 20-25 percent of in-house use, which can be raised to 40 percent by
saturating the area with propaganda about the reshelving study. Clearly
we need an accurate way to determine in-house use before we can
conclude that i t is reflected in external circulation records. In addition,
we need research on the extent to which planned or random factors in
the library can affect obsolescence. How much can libraries affect use of
material by layout and stack arrangement, by marketing techniques,
by storage, by cancellation of journal subscriptions, or initial failure to
buy? All these areas must be far more thoroughly researched before we
can claim to understand obsolescence.

Implications
And what has all of this meant to the librarian in the field? Unfortunately, not much. Not only is the concept of the obsolescence of literature and its implications for weeding and purchasing a touchy, political
SUMMER

1981

117

KAYE GAPEN

&

SIGRID MILNER

issue, but the almost contradictory results of the research done to date
have only clouded the issue further.
First, the problems with the research completed thus far include the
failure to build upon past research in either disproving or proving older
hypotheses; there has not evolved a body of agreed-upon definitions nor
a common vocabulary; data gathering in a variety of library situations is
not done consistently; the mathematical nature of the theoretical work
is generally unclear to most practicing librarians; and because there is
no model or methodology which can be applied by librarians as part of
the ongoing library operation, obsolescence is not a topic often chosen
by librarians for consideration as a research or management activity.
Indeed, the evidence available thus far supports almost any course of
action because the research results are contradictory and ungeneralizable. As Line and Sandison point out, we have not yet even proven the
validity of the concept of obsolescence. Even if one disagrees with Line
and Sandison, every other study speaks strongly to the necessity for
investigation in each individual library to determine local and ad hoc
use peculiarities. And so librarians make decisions every day about what
to buy, what to store and what to discard, relying on their own
judgment.
Second, the significant question could be asked (and is raised by
some of those whose research is reported here) as to whether the effort
required in undertaking use studies, or in gathering other obsolescence
data,justifies the time and effort required. Not only would i t take more
time than is now invested in maintaining awareness of collection use,
but there is no guarantee that the results could be applied any more
consistently nor be more beneficial. Most librarians are not yet convinced that this is a viable or more than peripheral topic.
Third, while the theoretical and mathematical nature of obsolescence can be investigated away from the library environment, the proof
or disproof of the theorems lies within the library doors, and i t is
unfortunately often the case that the researcher and the librarian (if not
the same person) are not in sympathy with one another. We are all
familar enough with this phenomenon to know that little credence will
be ascribed to research activity when some of the people affected have
not bought into the methodology and its results. This is particularly
true for a topic such as obsolescence, in which mathematical and theoretical skills must be linked to an intimate awareness of local library
idiosyncracies, past practice and past selection practices.
A final reason why research results have had only limited application is that this area of library operations (buying, storing, discarding) is
one of the most uncertain and risky when we consider the implications
118

LIBRARY TRENDS

0bsolescence
of incorrect actions. Not only are users denied immediate access to
desired information, but it is becoming increasingly difficult to fill in
gaps in the collection because of such factors as shorter print runs, etc.
Even the studies that are successful mathematically have not been able
to arrive at an algorithm or a guideline indicating which particular
book or volume or issue is the one which will or will not be used.
Human nature usually responds to situations involving high risk and
uncertainty in as safe a manner as possible. In this instance, it means
relying on ones own judgment in assessing the political and practical
realities rather than on some researchersincomprehensible mathematical recommendations.

Todays Circumstances

The circumstances of yesterday, however, are not those of today.


More librarians today must deal with the practical difficulties of shrinking budgets and limited space for collection growth. Then, too, there are
the more difficult policy issues related to cooperative activities, networking and any concomitant shared collection-development agreements, The expansion of networking possibilities causes us to look
anew at such questions as the importance of local autonomies, the
possible limitation of the capacity to respond to local user needs
promptly and fully, and the possible irreversibility of shared collection
development decisions.
In addition, todays decision-making environment is expanding to
include the involvement of people outside the library-faculty, students, administrators, legislators, etc. Each of these people brings different and sometimes conflicting needs, demands, pressures, fears, and
beliefs which must be responded to or resolved in some manner.
Finally, for many there looms on the horizon the feeling that
todays technological explosion might shortly make librarianship as we
have known i t obsolete. Even if that extreme case does not occur, it
certainly seems possible that technologically advanced storage devices,
collection access devices, communication lines, publishing and marketing innovations, and so forth will greatly alter what information libraries have to store, which users libraries might serve, and how that service
might occur.

A Problem-Solving Management Model


Research in preparation for this article has shown that the questions which remain to be answered in what has until now been considSUMMER 1981

119

KAYE CAPEN

&

SIGRID MILNER

ered a peripheral topic (obsolescence),and the questions which need to


be answered in responding to a central topic (operating libraries in
todays world), are intertwined and answerable only through the development of a new problem-solving/management model.
Incorporating the Model
The purpose of such a model would be to allow a library to derive,
review and incorporate data on obsolescence day by day. While a model
such as this can be designed in relation to other research topics such as
catalog use or budget forecasting, obsolescencecan serve as an example
in describing how to go about bringing the librarian and the researcher
together. First, what has become increasingly obvious to many librarians is the need for a more sophisticated application of management
techniques and decision-making tools which can support library operations practically. These tools need to be based upon and built into daily
library operations since the time required for data gathering and analysis can be extensive and will not be taken consistently if the work is
add-on rather than ongoing.
Since, however, information transfer and use (the basis for all
library service) is still a highly theoretical topic involving human
psychology, intelligence, habit, diligence, and laziness (to name but a
few human qualities), it is impossible to approach solely as an operations management issue. In addition to administrative techniques,
therefore, we also want to include aspects of behavioral psychology,
statistics and mathematical analysis.
To construct the basic framework of the model, what is needed is
the union of the librarian and the researcher in a joint effort which can
utilize the best which both have to offer. The librarian brings the
in-the-trenches, day-to-day, practical experience with the library user
and the materials used. The researcher brings the mathematical, modeling and analytic skills. Together, the two could build a framework for
data gathering and analysis designed to be implanted into the librarys
ongoing operations. While we would hope that the methodology would
permit as much generalization as possible, much more can be gained if
the model is sophisticated enough to be applied in a variety of types and
sizes of libraries, so that the patterns which might exist at the local or
national level can be detected as ad hoc results are combined and
analyzed.
Constructing the Model
The forum for constructing this model exists either in the American Library Association, where the various divisions have research and

120

LIBRARY TRENDS

0 bsolescence
policy committees, or in networks organized for other cooperative
endeavors. What is proposed here is a broad outline of how the model
might look and be applied. The purpose is to gather as complete and
consistent data as possible for a spectrum of libraries. In the case of
obsolescence there are two main questions which can be proposed. First,
what are the use patterns in libraries, and how can that use be ascertained? Second, what are the causal factors which interact to produce
those use patterns? In relation to the latter, we have been relying on
random influences, assuming they balance one another out, to produce
a quantitative ranking. But, as book publishers know, publicity, location, and even color of book jacket can affect use. Marketing in
libraries is another element which can affect use.
Other causal factors might include questions as to why and how
people do research. For example, concepts of the research project seem
to change during the course of research through refining and discarding
unusable topics. How would this pattern affect the use of materials in
libraries? One purpose of the model would be to distinguish true information use patterns from those information use characteristics resulting from local library policies, national policies and publisher
marketing policies.

Elements of the Model


The first part of the model, then, would be designed to gather as
much descriptive information as possible. The descriptive information
can be compared and combined to determine correlations among a
variety of possible elements. Elements to be considered might include:
1. Collection description: What is the nature of the institutions, student
population, curricula, faculty research interests, collection policies,
duplication agreements, weeding policies, and management of the
collection policies?
2. Acquisitions policies: How is the material budget divided between
serials, monographs and other formats? Who is responsible for selection? Are there any resource sharing agreements which might prescribe acquisition policies? How are funds allocated?
3. Technical seruices practices: How quickly after publication are
materials ordered? How quickly are materials received? How quickly
are materials processed, cataloged and otherwise made available?
What backlogs exist, and what is their nature, size and age? What
public catalog or other access tools are available? How many catalogs
are there and what is their nature? How are copies, volumes and
locations indicated? What filing rules are used?
SUMMER

1981

121

KAYE GAPEN

& SIGRID MILNER

4. Circulation practices and policies: Are users notified in some way of


new acquisitions? What are loan periods, recall and save policies?
Which categories of materials do not circulate? Are stacks open or
closed? Are some materials in storage, and if so, what are the policies
for selecting materials for storage? What is the quality of the stacks in
terms of shelving accuracy?
5 . Bindery operation: What is the binding policy? Is the public notified
of material at the bindery? How long is material unavailable?
6. Reserve area: What is the reserve policy? What is the size and nature
of the reserve collection?
7 . Other elements which might make the libra y easy or difficult to use:
What is the nature of the librarys graphics, handouts, tours, library
instruction, specialized classes?
As can be seen from this description, the model can be designed to.deal
with a very specific level of detail. While the remaining elements will

not be described so specifically, detailed elements can easily be drawn


from the earlier sections of the paper.
The second section of the model, then, would deal with external
factors which might influence use: publishers marketing practices,
publishers selection practices, publishing practices such as length of
volume or length of article accepted, shorter print runs, etc. The third
part of the model would explore: (1) knowledge and its nature: for
example, is publication increasing exponentially? and (2) information
use and transfer: how do people do research, how do people become
aware of new research, how is past research integrated into new research,
what types of users are there, and how might their use patterns differ?
The remainder of the model would be devoted to a variety of techniques
designed to detect user patterns consistently: for example, citation studies, and when and where they are applicable; circulation figures, and
when and how they might be analyzed; and journal use, detected either
from circulation figures or from some other technique for those collections where journals do not circulate.
The model including elements such as these could be constructed
by a combined task force of librarians and researchers to be applied in
the individual library, but designed so that i t might be applied over a
variety of libraries, with information then fed into a larger analytical
body. The model would include not only standard descriptive elements
so that types of libraries could be ascertained, but also standard definitions and outline techniques for gathering and analyzing use data. It
would further include standard guidelines for costing out various
acquisition, storage and processing decisions so that trade-offs could

122

LIBRARY TRENDS

0bsolescence
also be evaluated financially. Finally, it would provide guidelines for
altering statistic-keeping practices in order for standard statistics to be
implemented in a library and then brought together on a more comprehensive scale.
Once the model is constructed and tested, its application would not
only become part of the librarys ongoing operation, but it would also
involve librarians and researchers in other sorts of information gathering activities as appropriate, particularly in the behavioral sciencesand
information sciences aspect of the question. Results would regularly be
analyzed within the local library context, and those results and analyses
passed on to a larger analytical body for analysis and possible further
refinement of the model. Implementation of this model would provide
not only more sophisticated management of library operations, but also
information essential to the understanding of how libraries are used and
how information was used.

Conclusion
In conclusion, while the practical results of the obsolescence
research done to date are of little value or use in daily library operations,
many of the points under consideration are vital to ensuring the viability of library operations and are worthy of new consideration. Moreover,
the critical nature of todays library world makes it imperative that
librarians attempt a new approach to the management of library operations, including the investigation of the essentials upon which library
service is based. The construction of a series of comprehensive models
which can combine research with a librarys ongoing activities will
begin to produce the information, data and quality library service
which can ensure that libraries continue to play an active role in the
information transfer process. If nothing more, the obsolescenceresearch
done to date demonstrates that research must meet reality, and it is now
encumbent upon us as librarians and researchers to ensure that that
meeting is cordial, provocatively positive, and enhancing.

References
1 . Line, Maurice B., and Sandison, Alexander. Obsolescenceand Changesin the
Use of Literature with Time. Journal of Documentation SO(Sept. 1974):283.
2. Gosnell, Charles F. Obsolescence of Books in College Libraries. College &
Research Libraries 4(March 1944):115-25.
3. Evans, Glyn. Introduction to Obsolescenceof Books in College Libraries,by
SUMMER

1981

123

KAYE GAPEN

8C

SIGRID MILNER

Charles F. Gosnell. Collection Management Z(Summer 1978):167.


4. Seymour, Carol A. Weeding the Collection: A Review of Research on Identifying Obsolete Stock. Part 1: Monographs. Libri 22(1972):137-48;and Weeding the
Collection: A Review of Research on Identifying Obsolete Stock. Part 11: Serials. Libri
22( 1972):183-89.
5. Taylor, Colin R. A Practical Solution to Weeding University Library Collections. Collection Management l(Fal1-Winter 1976-77):27-45.
6. Trueswell, Richard. A Quantitative Measure of User Circulation Requirements
and its Possible Effects on StackThinningand Multiple Copy Determination. American
Documentation 16(Jan. 1965):ZO-25.
7. Fussler, Herman H., and Simon, Julian L. Patterns in the Use of Books in Large
Research Libraries. Chicago: University of Chicago Press, 1969, p. 210.
8. Ibid.
9. Line and Sandison, Obsolescmce and Changes.
10. Line, and Sandison, Obsolescence and Changes, pp. 283-350.
11. Griffith, Belver C., et al. The Aging of Scientific Literature: A Citation
Analysis. Journal of Documentation 35(Sept. 1979):179-96.
12. Sandison. Alexander. Densities of Use, and Absence of Obsolescence, in Physics
Journals at MIT. Journal of the ASZS 25(May-June 1974):172-82; and Chen, C.C. The
Use Patterns of Physics Journals in a Large Academic Research Library. Journal of the
ASIS 23( 1972):254-70.
13. Sandison, Densities of Use.
14. Line, Maurice B.. and Sandison, Alexander. Practical Interpretation of Citation
and Library Use Studies. College 6.Research Libraries 36(Sept. 1975):393-96.
15. Taylor, A Practical Solution.
16. Bulick, Stephen, et al. Use of Library Materials in Termsof Age. Journal ofthe
ASZS 27(May-June 1976):175-78.
17. Longyear, R.M. Article Citations and Obsolescence in Musicological
Journals. Notes 33(March 1977):563-71.
18. Pan, Elizabeth. Journal Citation as a Predictor of Journal Usage in Libraries.
Collection Management Z(Spring 1978):29-38.
19. Line, Maurice B. Rank Lists Based on Citations and Library Uses as Indicators
of Journals Usage in Individual Libraries. Collection Management 2(Winter 1978):31316.
20. Ibid., p. 315.
21. Hindle. Anthonv. and Buckland. Michael K. In-Librarv Book LJsare in Relation to Circulation. Coilection Management Z(Winter 1978):265-77.
22. Evans, Introduction to Obsolescence, p. 167.
23. Gosnell, Obsolescence of Books.
24. Bronmo, Ole A. On the Influence of Availability on the Use of Monographs in
Library Criticism. Tidskrifl for Dokumentatzon 34(1978):81-83.
25. Ibid., p. 83.
26. Kent, Allen, et al. Use of Library Matertals: The University of Pittsburgh Study.
New York: Marcel Dekker, 1979, p. 272.
27. Ibid., p. 10.
28. Hindle and Buckland, In-Library Book Usage, p. 267.
Y

124

LIBRARY TRENDS

The Law of Exponential Growth:


Evidence, Implications and Forecasts
JEAN TAGUE
JAMSHID BEHESHTI
LORNA REES-POTTER

THENOTION THAT KNOWLEDGE grows exponentially seems to have first


appeared in a short story by Sir Arthur Conan Doyle, The Great
Keinplatz Experiment, which contains the statement, Knowledge
begets knowledge as money bears interest. Thus, knowledge growth is
likened to compound interest-the increase at any time is a fixed
percentage of the current amount. This type of growth is described
mathematically by an exponential function. If F(t) represents the size at
time t , the exponential function, or law, may be expressed as
F(t) = aebt
(1)
where a is the initial size-i.e., at time t =O-and b, the continuous
growth rate, is related to the percentage by which the size increases each
year (or other appropriate time unit). Specifically, this percentage is
given by
r=lOO(eb-l),or, approximately, r=100b.
For example, if the amount of knowledge at some initial time is a=10,000
and the growth rate is approximately r =10percent, then after 10years the
amount of knowlege will be
F(I0) = 10,OOOeO= 27,183.
After 100 years the amount will be
F( 100) = 10,OOOeo~oo
= 220,264,660.
Jean Tague is Professor, School of Library and Information Science, and Jamshid
Beheshti and Lorna Rees-Potter are doctoral students, University of Western Ontario,
London.
SUMMER

1981

125

J. TAGUE, J. BEHESHTI

8C

L. REES-POITER

Another quantity that is of interest with respect to exponential


growth is doubling time: the fixed period of time in which the size of the
literature doubles. Doubling time is given by
d=log, 2/b.
For the above example, the amount of knowledge doubles every d =
0.693/0.1=6.93 years.
Not all writers agree on the exponential nature of this growth.
Popper says the growth of knowledge ...is not a repetitive or cumulative process, but one of error elimination. Similarly, Rescher comments: Science progresses not additively but largely subtractively.
Todays major discoveries represent an overthrow of yesterday'^."^
Price4 has brought the idea of exponential knowledge growth in the
sciences to the attention of a wide audience. He looks at various indicators of growth, including the number of scientists, number of scientific
journals, number of scientific abstracts, andamount of scientific expenditure. For the scientific literature, he found a growth rate of approximately 5 percent over the past two centuries, corresponding to a
doubling time of fifteen years. Growth of knowledge must be distinguished from growth of the literature or growth in number of publications. The former is a more abstract concept and hence not so directly
assessed. In bibliometrics, growth in number of publications is sometimes taken as a measure or operational definition of growth of knowledge. There are, however, other points of view. Rescher defines the
A -quality level, 0 < A 5 1, of a publication or finding as follows: if
there are F(t)publications in all at time t , then there will be [F(t)] * publications at the A -level. He characterizes specific values as follows:

A
A

1 at least routine

% at least significant
/i at least important

A = W at least very important


A = 0 first-rate
For first rate contributions ( A=O), the number of publications is log F(t).
Rescher points out that the value of H corresponds to Rousseaus law,
which states that the number of important contributions is the square
root of the total number of contributions. Thus, if the size of the
literature is 1 million publications, in terms of Reschers A -levels,
there would be:

l,OOO,OOO at least routine publications

31,623 at least significant publications

1,OOO at least important publications

126

LIBRARY TRENDS

Law of Exponential Growth


32 at least very important publications
14 first-rate publications

If the total literature (assuming anything published is at least


routine) is growing exponentially with a doubling time d , then the
literature of A -quality, for A > 0, is growing exponentially with the
doubling time of d/ A . Thus, as one ascends the quality scale, exponential growth slows down. For first-rate literature, exponential growth
breaks down completely and there is merely a constant increment in
each time period. In this case the growth function is linear, i.e., the
number of first-rate publications at time t is given by
Fo( t) = log a + bt
when the total number of publications is given by (1). Here, b would
represent the constant increment. In the earlier example, in which the
doubling time was 6.93 years, the corresponding doubling times for
each A -level group of publications would be
9.24 years for at least significant publications,
12.60 years for at least important publications,
27.73 years for very important publications.
The number of first-rate publications at time t would be given by the
function
Fo(t) =9.21 + O.lt
That is, there is only one additional first-rate publication every ten
years.
Exponential increase occurs when there are no limits to growth.
However, if there is some limitation, intellectual, physical, or economic, on the size of the literature, then other functions, such as the
logistic, may be more appropriate. Price points out that organisms in a
closed environment (e.g., fruit flies in a bottle) tend to follow a logistic
rather than an exponential growth function. The logistic curve is
characterized by a lower limit (usually 0) and an upper limit or ceiling,
beyond which size cannot grow. The equation for the logistic curve is
k
F(t) =
1 ae-bt
where F(t) represents the size at time t , and k the ceiling. The shapes of
the logistic curve and exponential and linear ones in the same range are
shown in figure 1. The curve is symmetrical about the point of inflection at
t=
loga
= t.
b

SUMMER

1981

127

J . TAGUE, J. BEHESHTI

&

L. REES-POlTER

If t < t, the growth rate is increasing; if t > t, the growth rate is


decreasing. Using the previous hypothetical example, if size at the
initial time t=O is 10,000publications, the initial yearly growth rate is 10
percent and the upper limit is 300 million publications, then the
appropriate logistic function is
300,000,000
F(t) =
1 + 29,999e.1.
After ten years the size of the literature would be 27,181 publications,
i.e., almost the same as under exponential growth. However, after 100
years, the size would be only 127,013,560, instead of the 220,264,660
publications which would be obtained with exponential growth.
The growth pattern of subfields of knowledge or research areas may
be different from that of the parent field. Crane5 suggests that some
subfields show the first three stages of a logistic pattern. These fields
are diffusion of agricultural innovations, 1941-66 (sociology); and theory of finite groups, 1934-68 (mathematics). Her characterization of
logistic growth is not strictly accurate. It involves four stages: a slow
start, a period of exponential growth, a period of linear growth, and
then a period of slow, irregular growth. However, as indicated above,
the logistic curve is perfectly symmetrical on either side of the midpoint
with the growth rate always increasing before the midpoint and always
decreasing after the midpoint, but never constant or linear. In fact, the
growth curves shown for Cranes two subfields could equally well be
described as exponential followed by linear. This pattern was also
found by Lawson and others6in the energy analysis subfield. The closest
approximation to a true logistic curve seems to be the growth curve of
the coal gasification literature for the period 1965-75, as described by
Frame, et al.
In two other fields, invariant theory (1887-1941) and reading
research (1881-1957), Crane found a linear growth pattern. Sullivan
found a similar pattern in the physics literature, both experimental and
theoretical, concerned with weak interactions for the period 1950-72.
Menard found linear growth in the subfield of optics, but in three other
subfields of physics he found exponential growth, though at differing
rates: nuclear physics has doubled every four or five years since 1920and
solid state physics since 1950; acoustics, on the other hand, had a
doubling time of forty years prior to World War 11, but since then has
been doubling at normal rates-i.e., every fifteen years?
Menard distinguishes three types of subfields: stable fields, which
tend to grow linearly or exponentially at very slow rates; growth fields,
which grow exponentially at fast rates; and cyclic fields, which fluctu128

LIBRARY TRENDS

Law of Exponential Growth

-
-

0-

%-

_________

_____

CUMULATIVE DATA

EXPONENTIAL
LOGISTIC
LINEAR

to

0-

0-

8-

YEAR

Fig. 1. Cumulative numbers of Chemical Abstracts fitted by least-squares to


linear, exponential and logistic functions.

ate, with stable and growth periods alternating. An example of a stable


field would be vertebrate paleontology, described by Menard. An example of a growth field would be activation analysis (chemistry),described
by Braun: for which doubling time over the period 1935-75 has been
three years. An example of a cyclic field-liquid crystals-was presented
by Bottle and Rees." During the period 1888-1974, the number of
publications increased to a peak in 1910, then decreased and lay dormant in the 1930s and 1940s, then increased exponentially in the 1960s.
Menard suggests that the overall growth rate of a discipline varies at
different times depending on the proportion of papers from stable,
growth and cyclic fields.
Goffman's epidemic model is, to some extent, similar to Menard's
cyclic model. Scientists are classified as: (1) infectives-those currently
publishing in the field, (2) removals-those who have published in the
past, and (3) susceptibles-those who may publish in the future. If S(t),
SUMMER

1981

129

J. TAGUE, J. BEHESHTI

&

L. RED-POTI'ER

I(t) and R(t) represent, respectively, the number of susceptibles, infectives, and removals at a point in time t, then the change in these
functions can be described by a set of differential equations and a
threshold level determined for the number of susceptibles required to
produce an epidemic. The constants in these equations represent the
rate of infection, the rates at which susceptibles and infectives are
removed, and the rates at which new supplies of infectives and susceptibles enter the population. The model has been applied to the research
literature of mast cells;" shistosomiasis, 1862-1962;'' symbolic logic,
1847-1962;13and polywater, 1962-74.14The curves for the first two literatures display the usual exponential pattern; symbolic logic literature is
cyclic, with peaks in 1907,1932 and 1957;and polywater literature hasa
single peak in 1970.
The epidemic model is difficult to evaluate because of the indefiniteness in its presentation and applications. In no case are all three
functions S(t), I(t) and R(t) stated explicitly as functions of time,
although an exponential form is suggested for I(t). Also, the constants
required in the differential equations are not all estimated from the
empirical data. The impression is that any kind of cyclic or exponential
growth pattern is compatible with the epidemic model.
One general problem in describing the literature growth of a subfield is that it is difficult to determine when the subfield first arosefrom
its originating field. As Menard has pointed out, indexes and abstract
journals do not ordinarily create new classes or subheadings until after
the first 100 or so papers have appeared. Eventually, if the subfield
becomes very large, it will split into two or more subfields. Increasing
specialization is the response of scientists to an increasing literature
burden. However, recent investigations by Small indicate it may be
possible to identify specialties by means of cocitation-based content
analysis. 15

The Evidence
What is the evidence for exponential growth? The answer depends
on what one is counting and when.
Knowledge growth may mean literature growth-increase in the
number of publications in a field-or information growth-increase in
the number of ideas in the field. As Gilbert" has pointed out in connection with indicators of scientific growth, the use of the former as a
measure of the latter assumes, first, that all knowledge is contained in
the published literature, and second, that every paper containsan equal
amount of knowledge.
130

LIBRARY TRENDS

Law of Exponential Growth


Even if number of publications (where the wordpublication is used
in a broad sense to mean anything in the form of text) is a reasonably
valid approximation of the amount of knowledge, the reliability of
counts of publications in specific fields must be questioned. Usually,
these are based on items in the standard abstracting journal for the field.
Moravcsik" has pointed out that many scientific communications do
not appear as articles in scientific journals, the primary source of
materials for the abstract journals. Abstract journals are biased geographically and linguistically; they do not include material in nearprint form, material which results from military or proprietary research
and is not published in the open literature, or informal person-toperson communication. Although the ideas in these other materials
may appear eventually in print, it is difficult to assess the number that
do not.
Bearing in mind the limitations of these data, let us, however,
examine the growth of the literature as revealed by counts of the number
of abstracts in some of the major abstracting journals. The chemical
literature has been analyzed more than any other, probably because of
the wide coverage of Chemical Abstracts and the stability of its growth
pattern. Figure 1 shows the cumulated number of chemical abstracts u p
to 1979, together with the best-fitting linear, exponential and logistic
curves. By a cumulated curve is meant one in which the number of
abstracts is cumulated or summed from year to year, beginning at a
specified point in time-in this case, 1907. Best fit is defined by the
least-squares criterion. In looking at the literature of literature growth,
one is struck by the absence of data fitting by least squares. Most
exponential growth rates seem to be determined by eye from the empirical plots. Usually, the reader can determine empirical values only
approximately from the plots rather than exactly from a table. It is thus
difficult to check on the specified growth rates, doubling times and
other characteristics deduced by the author. The counts upon which the
figures in this paper are based are given in the appendix.
May" has pointed out that by beginning a cumulated curve in a
specific year such as 1907, the earlier literature is ignored. This usually
results in an overestimation of growth rates. For example, if the cumulated totals for the mathematics literature are begun in 1920 rather than
in 1868, the growth rate increases from 2.5 percent to 4.6 percent. May's
method for including the earlier literature is to fit the noncumulated
annual counts of publications to an exponential curve. This curve is
then integrated to obtain the corresponding cumulated curve. The
continuous growth rate (b in equation 1) will be the same for both
SUMMER

1981

131

J.

TACUE, J.

BEHESHTI

& L. REES-POTI'ER

curves, but the constant factor (a in equation 1)will change. For example, applying May's method to the annual noncumulated output for
Chemical Abstracts 1907-79, one obtains the exponential curve:
0.04qt-1906)

f(t) = 12,061 e
If this function is integrated from -00 to 1907, the estimated cumulated
number of chemical publicationsprior to 1907,i.e., 262,196, is obtained.
This number i s then added to the cumulated number of publications
since that time, as determined from Chemical Abstracts counts, to
obtain the data points in figure 1. The three theoretical curves are the
least-squares exponential, linear and logistic fits to these points. The
corresponding functions and multiple squared correlation coefficients
arc given in table 1. The squared correlation coefficient represents the
proportion of the variation of cumulated size values which can be
explained by the theoretical function. The algorithm developed by
O l i ~ e r was
' ~ used in an attempt to find a least--squaresfit to the logistic
curve, but unfortunately did not converge. The function given is thus
only an approximation to the least-squares solution.

TABLE 1

FUNCTIONS
APPROXIMATING
THE CUMULATIVE
NUMBER
OF

CHEMICAL
ABSTRACTS,
1907-79
Function

TYPe
Linear
Exponential

Logistic

F(t) =
F(t) =
F(t) =

-999,000+88,013(t-1906)
282,546.94emmz-1m)
44,751,400

R2
0.811

0.995

0.986

+ 170.743e-.Mwt-1m)

For the Chemical Abstracts data, 1907-79, the exponential growth


rate is thus 4.5 percent, corresponding to a doubling time of fifteen
years. For the linear fit, the constant increment is 88,013papers per year.
The midpoint of the logistic fit is at the year 2008, and the upper limit
for this function is 44,751,400 papers.
To compare the growth of thechemical literature with that in other
fields, annual counts of the number of abstracts from 1960 to 1979 were
recorded for the following journals: Science Abstracts (physics, electrical engineering, computers, and control), Biological Abstracts, Chemacal Abstracts, Psychological Abstracts, Library and Znformation Science
132

LIBRARY TRENDS

Law of Exponential Growth


Abstracts, International Political Science Abstracts, Historical
Abstracts, and Sociological Abstracts. Figure 2 shows cumulated
number of abstracts in Chemical Abstracts, Science Abstracts and Biological Abstracts, 1960-79;figure 3 showns the same data for Sociological Abstracts, International Political Science Abstracts and Historical
Abstracts; figure 4,the same data for Psychological Abstracts; and figure
5 , the same data for Librai,yand Znformation Science Abstracts. Groupings were determined, in part, by the scale of the vertical axis, and in part
by similarities in subject matter. In these cases, nocorrection was made
for pre-1960 literature, so that the data points shown in figures 2, 3, 4,
and 5 show cumulations relative to 1960 only. By fitting exponential
functions to both the noncumulated and cumulated values, using May's
method described earlier, i t was possible to obtain growth rates either
incorporating or ignoring the pre-1960 literature. Fits were also made
just to the 1970-79 figure to determine if growth was changing in the
seventies.

5)

I
Ln
n

SCIENCE ABSTRACTS

BIOLOGICAL ABSTRACTS
CHEMICAL ABSTRACTS

1960

1963

1966

1969

1971

197t

1977

1980

YEAR

Fig. 2. Cumulative numbers of abstracts in three abstract journals, 1960-79.


SUMMER 1981

133

J. TAGUE, J. BEHESHTI

&

L. REES-POTTER

I P S ABSTRACTS
H I S T O R I C A L ABSTRACTS
SOCIOLOGICAL ABSTRACTS

.,a,,eeeee.

1960

1963

1966

1969

1971

197+

1977

1980

YEAR

Fig. 3. Cumulative numbers of abstracts in three abstract journals, 1960-79.


The annual growth rates for the two periods, 1960-79and 1970-79,
based on cumulated and noncumulated figures, are shown in table 2. An
examination of these indicates that in the seventies, for the most part,
growth is slowing down. Rates are generally higher in the social sciences than in the physical and biological sciences, but it is not clear
whether this difference is due to an increase in the social science literature or a change in coverage of the abstracting journals. As far as
chemistry is concerned, Baker, in a review of Chemical Abstracts growth
rates,20says that the journal coverage policy for ChemicaZAbstructs has
not changed in twenty-fiveyears, although that for patents has changed.
The smaller growth rates obtained when the noncumulated values are
taken into account are consistent with Mays predictions. Only in one
out of sixteen cases, Historical Abstracts for 1970-79,are the noncumulated rates greater than the cumulated ones. This anomaly may be due to
the strange behavior of Historical Abstracts annual production, which
increased approximately 60 percent in 1977.Also remarkable is the wide
134

LIBRARY TRENDS

Law of Exponential Growth

0
0

4h

0
0
d

ln

Y
E

0
0

fa0

cc

B0

v)

rn
c
0
el0

Oae

0
0

OBO

.
I

0
00
@e0*

YEAR

Fig. 4. Cumulative numbers of abstracts in Psychological Abstracts, 1960-79.

variation in growth rates from decade to decade and science to science,


making questionable such blanket statements as the scientific literature is growing at 5 percent per year. Also, i t is not always clear, when
authors are discussing the growth of science, whether just the physical
and biological sciences are intended, or the social sciences as well.
The annual and cumulated data for each abstracting journal and
for the two time periods were fit to both exponential and linear functions using least-squares procedures. The resulting squared correlation
values are given in tables 3 and 4. In all cases, reasonable fits can be
obtained to either an exponential or linear function. In all cases except
Library and Information Science Abstracts, International Political
Science Abstracts, and Historical Abstracts, the linear fits were better for
the 1960-79 data, both cumulated and noncumulated. Thus, growth
does seem to be slowing down and moving toward a linear rather than
an exponential stage.

SUMMER

1981

135

J. TAGUE, J. BEHESHTI 8C L. REES-POTTER

0
0

0
Q
0

0"
0
0
L
0

YEAR

Fig. 5. Cumulative numbers of abstracts in Library a n d I n f o r m a t i o n Science


Abstracts, 1960-79.

TABLE 2
ANNUAL
GROWTH
RATE PERCENTAGE^ FOR ABSTRACTS
IN
EIGHTABSTRACTING JOURNALS, 1960-79

Abstract Journal
Science Abstracts
Biological Abstracts
Chemtcal Abstracts
Psy c hologica 1 Abstracts
Library and Information
Science Abstracts
International Political
Science Abstracts
Historrcal Abstracts
Sociologacal Abstracts

1960-79 Noncumulated
9.0

Annual Growth Rates


1960-79
1970-79 NonCumulated
cumulated

1970-79
Cumulated

6.2
7.3

19.0
15.4
16.6
17.8

2.0
1 .O
4.8

3.5

10.1
10.1

10.2

18.3

6.4

13.2

8.8
9.3

16.6
16.7

9.8
14.4

13.9
13.4

6.7

19.0

3.3

3.3

11.4
8.0

9.7
~~~

136

LIBRARY TRENDS

Law of Exponential Growth

TABLE 3
SQUARED
MULTIPLE
CORRELATION
COEFFICENTS
FOR LINEAR
AND
EXPONENTIAL
FITSTO CUMULATED
NUMBERS
OF ABSTRACTS,
1960-79
Abstract Journal
Science Abstracts
Biological Abstracts
Chemical Abstracts
Psychological Abstracts
Library and Information
Science Abstracts
Internationa 1 Politica 1
Science Abstracts
Historica 1 Abstracts
Socio logica 1 Abstracts

Linear Fit

Exponential Fit

0.959
0.995
0.977
0.977

0.937
0.883
0.91 1
0.925

0.930

0.960

0.923
0.919
0.987

0.954
0.940
0.879

TABLE 4
SQUARED
MULTIPLE
CORRELATION
COEFFICIENTS
FOR LINEAR
AND

EXPONENTIAL
FITSTO NONCUMULATED
NUMBERS

OF ABSTRACTS,
1970-79
Abstract Journal
Science Abstracts
Biologica 1 Abstracts
Chemical Abstracts
Psychologica 1 Abstracts
Library and Information
Science Abstracts
International Pol itica 1
Science Abstracts
Historica 1 Abstracts
Sociological Abstracts

Linear Fit

Exponential Fit

0.913
0.833
0.984
0.922

0.910
0.770
0.982
0.864

0.901

0.898

0.821
0.759
0.884

0.853
0.880
0.784

Abstract journal counts are useful for estimating growth within a


discipline. However, they cannot be added together to determine overall
literature growth because of journal overlap. Some attempts have been
made to estimate the total number of journals, but these seem to have a
rather low reliability, being heavily dependent on the source of the
counts. Ulrichs International Periodica 1 Directory, 1979-80, estimated
its total coverage to be 62,000 periodicals. Carpenter and Narin21used a
magnetic tape of all serial publications received by the British Lending
Library Division in 1973 and came up with 16,346journals in the fields
SUMMER

1981

157

J. TAGUE, J. BEHESHTI

8C

L. REES-POTTER

of clinical medicine, biomedicine, biology, chemistry, physics, earth


and space science, psychology, mathematics, and engineering. An earlier count by Hulme in 1921, based on journals referred to in the
International Catalog of Scientific Literature, 1908-12,produced 7610
journals (excluding psychology and engineering)z2Thus, for scientific
journals, the recent doubling time appears to be 57 years. A different
figure for total number of scientific and technical journals is given by
Gottschalk and Desmond of the Library of Congress in 1963.23Their
figure is 35,000 f 10 percent, and is based on a perusal of the most
comprehensive and recent serial directory for each country. In 1962,
Bourne estimated the total number of journals, based on an inventory
being performed at the Science and Technology Division at the Library
of Congress, as 30,000 to 35,000.24The perrentage of the literature
covered by abstracting journals varies from field to field. Overall, it is
about 75 percent, but ranges from 98 percent for chemistry to50percent
for biology. These percentages were estimated by editors and others
knowledgeable in the subject field. Thus, if Bournes figures are correct,
the totals shown in figures 2-5 have varying reliability as measures of the
total literature production in a field.
Knowledge, particularly in the humanities, may be better represented by book rather than journal article production. Figure 6 shows
cumulated figures for numbers of first-edition titles produced by the
principal English-speaking countries, with the exception of Australia,
as compiled in the Unesco Statistical Yearbook. The data are available
for ten consecutive years from 1967 to 1976 for Canada, New Zealand,
United Kingdom, and the United States. The data constitute 24.8 percent of the world production of first editions for 1976. Of this figure, 17.2
percent is from the United States, 5.8 percent from the United Kingdom,
1.4 percent from Canada, and 0.4 percent from New Zealand. Unfortunately, Australian figures were incomplete and had to be omitted. Some
inconsistencies exist among the various countries. Whereas Canada
does not include its government publications in book production figures, 20 percent of the 1976 U.S. data consist of federal government
publications. In figure 6, the data will be seen to be linear (r2=0.998u. r2
= 0.919 for the exponential function).
Interpretation

To what extent does number of publications actually measure


knowledge? Does each publication make a significant and equal contribution to the stock of ideas? One of the few empirical investigations of

138

LIBRARY TRENDS

Law of Exponential Growth

LL

w
w

l-l

I-

<

Fig. 6. Cumulative numbers of first editions published in the United States,


United Kingdom, Canada, and New Zealand, 1967-76.
this question was carried out by May,% who classified mathematical
papers on the subject of determinants, as contained in a 1923 bibliography, into six categories: new ideas and results, applications, systematization and history, texts and education, duplications, and trivia. The
numbers of articles in each category and percentage of total is shown in
table 5 . If these numbers are compared with Reschers X-quality index
and Rousseaus law, it is apparent that, in subject area of determinants
at least, there are more than -45
important papers and log
(1995)==8first-rate papers. However, the discrepancy may arise from the
fact that May considers as literature only scientific contributions
abstracted in professional mathematical journals, but not popularizations and elementary textbooks. Thus, the total number of publications
is probably greater than 1995.
May also analyzes individual time trends in each category. New
results and ideas are stable, averaging about three per year. Applications
SUMMER

1981

139

J . TAGUE, J. BEHESHTI & L. REES-POITER

TABLE 5
MAYSCATEGORIZATION
OF THE LITERATURE
OF DETERMINANTS
TO 1920
Category
New ideas and results
Applications
Systematization and history
Texts and education
Duplications
Trivia

Number

01 Papers

235
208
199
266
350
737

Percentage

12

10

10

13

18

37

are closely correlated with new results, with some time lag. Pronounced
peaks are observed in texts, publications and trivia. May describes the
pattern as follows: First the basic theory is worked out in close relation
to applications. Its successes lead to many textbooks and then to a rush
into the field of workers who inevitably lower over-all quality.26
Surprisingly, considering its importance to bibliometric
approaches to the growth of knowledge, Mays study has not been
duplicated in other subfields. Of course, such analyses are very timeconsuming and require expert knowledge. A criticism can be made that
the assignment to categories is very subjective. Also, such a categorization fails to recognize that some duplication is necessary to ensure that
new results reach a variety of audiences. However, in general, such
analyses can be very revealing.
To investigate the viability of Mays approach in another subfield
and to familiarize ourselves with its problems, we applied a similar
analysis to studies of obsolescence of library materials. The corpus of
papers was obtained by checking the heading Obsolescence of books,
periodicals, etc. in Library Literature from its first appearance in 1970
and then extending the set to include appropriate references contained
in the initial articles. The survey was restricted to English-language
items.
Because of the small number of papers, forty-six in all, they were
divided into four (rather than six) categories: (1) new ideas and results;
(2)new applications; (3)reviews and historical surveys; and (4) popularizations, duplications, trivia. Initially, each paper was categorized by
two of the writers independently. Disagreements were then resolved by
discussion and more precise definition of the categories. The publication dates ranged from 1944 to 1980. The numbers and percentages for
each category are given in table 6. Although not nearly so comprehen140

LIBRARY TRENDS

Law of Exponential Growth


sive as Mays study, these figures do seem to substantiate his finding that
new ideas and results (innovations) account for a relatively small percentage (in this case, 28.2 percent) of the total. The variation over time is
shown in figure 7. The number of innovative articles remains relatively
constant, whereas the total number increases, possibly exponentially,
over the time period.

TABLE 6
LITERATURE
OF OBSOLESCENCE,
1944-80
Number of Papers

Percentage

Number of Authors

New ideas and results


Applications
Surveys and reviews

13
11
3

28

Other

19

41

11
11
3
16

Category

24

It has been suggested by Price and other bibliometricians that the


degree to which articles represent innovations can be determined from
citation counts. To assess this claim, the number of citations to each of
the obsolescence papers published in the period 1944-77was determined
from Social Sciences Citation Index. Later papers were not included, as
they had probably not yet really entered the citation cycle. Table 7
shows, for each category, the number of papers, the average number of
citations per paper, and the minimum and maximum numbers of
citations. It is interesting that in category 1, the earliest paper located
(that by Gosnell in 194427)
received only two citations. Apparently it was
ahead of its time. Overall, one must conclude from this brief survey that
although citations do give some indication of quality, they can be so
used only in an approximate or average way and not for individual
papers.
Some historians and sociologists have made similar points about
the use of publications as growth indicators and of citations as quality
indicators. Moravcsik notes that differences in publication patterns in
differentcountries and different fields make the use of a paper as a unit
of knowledge somewhat suspect.% Computers may eventually so
change the nature of papers and citations that it will no longer be
possible to count them in any meaningful way. Also, once a discovery
has entered the public domain, e.g., Einsteins equation E =mc2,the
original paper is not usually cited. Moravcsik suggests that publications
SUMMER

1981

141

J. TAGUE, J. BEHESHTI

L. REES-POTTER

TOTAL NUMBER OF PAPERS


INNOVATIVE PAPERS

LL

&

1959

1st

1969

1975

1980

YEAR
Fig. 7. Numbers of innovative papers and total papers published on obsolescence, 1944-80.

TABLE 7
CITATIONS
PER ARTICLE
FOR PAPERS
ON
Article
Category

New ideas and theory


Applications

Reviews
Other

142

No, Papers

13
11
3
19

OBSOLESCENCE,

1944-77

Awrage
Minimum
Maximum
No. Citations No. Citations N o . Citations

12
7
6
4

1
0
4

28
14
8

23

LIBRARY TRENDS

Law of Exponential Growth


and citation counts may be good first approximations to a measure of
scientific growth: The task then is to estimate the size of thecorrection
to this approximation and to construct more refined but equally practical versions of these measures which take into account these
Chubin and Studer have similar reservations about the use of
citations as indicators of importance or innovation. In a study of 656
articles about research on a DNA polymerase reverse transcriptase,
they noted that only the force of facts (e.g., Baltimore and Temin and
Mixutani did independently discover the DNA polymerase) keeps the
larger, well-funded laboratories of Spiegelman and the National Cancer
Institute from swamping the citation
Chubin and Moitra
classify citations as essential (basic and subsidiary), supplementary
(additional and perfunctory), and negative (partial and total). In a study
of 443 references in forty-three articles in high-energy physics, they
found 57.1 percent of the citations were either supplementary or
negative.31

Forecasts
In 1963, Price said: There is a possibility the exponential law is
breaking down.32Exponential growth cannot go on forever. Recent
figures seem to indicate that this change is indeed occurring. Price
predicts that, when limits to growth are imposed on such a process,
there will be various reactions: escalation of a new process, loss of
definition of the old process, divergent (i.e., widely fluctuating) oscillations, or oscillations converging to the limit. Like Moravcsik, he feels
changing communication patterns among scientists, brought about by
new technology, will lead to a situation in which publications are of
secondary value in communicating innovations-for popularization
rather than research needs.
Rescher believes that this quality drag principle-i.e., that exponential increase in the total number of papers is needed to produce a
linear increase in the number of first-rate papers-means that, eventually, the pace of innovation (i.e., first-rate findings) will begin to
decline.% He regards the exponential increase in publication not as
useless verbiage but as the useful and necessary inputs needed for
genuine advances. However, in an age of dwindling resources, the world
can no longer afford exponential input. Thus, growth in number of
publications will become linear-perhaps has already become linear in
the seventies. The growth in cumulative number of first-rate publicaSUMMER

1981

143

J . TACUE, J. BEHESHTI

& L.

REES-POTTER

tions will then be logarithmic, i.e.,


F,(t) = loge(a+bt),
and the continuous growth rate will become
b/(a+bt).
In other words, the further into the future we go, the fewer the additional number of first-rate publications. We are moving from an exponential growth past to a linear growth future.
To conclude, many papers have tried to estimate the growth of
knowledge in various ways, and as many questions have been raised
about the validity and reliability of bibliometric measures for this
process. It appears that, for the growth of knowledge subfield, the
time is not yet ripe for a logarithmic decline in the number of first-rate
papers. There is an obvious need for better compilations of statistics on
numbers of publications in the various disciplines on a worldwide scale,
for informed, critical assessments of the amount of new knowledge
contributed by these publications, and for enhancements and refinements of the present bibliometric techniques (citation and publication
counts), so that valid measures of knowledge growth may be obtained.
Also, studies of literature growth need to become more exact in the
description of their models and more rigorous in the application of
statistical tests to determine how well these models fit reality. Only then
will bibliometrics be able to provide accurate, useful descriptions
and predictions of knowledge growth.

References
1. A. Conan Doyle. Quoted in Nicholas Rescher. Scientific Progress. Pittsburgh:
University of Pittsburgh Press, 1978, p. 54.
2. Popper, Karl. Objective Knowledge; An Evolutionary Approach. Oxford:
Clarendon Press, 1972, p. 144.
3 . Rescher, Scientific Progress, p. 48.
4. Price, Derek de Solla. Little Science, Big Science. New York: Columbia Univer. Science Since Babylon. New Haven, Conn.: Yale
sity Press, 1963; and
University Press, 1961.
5 . Crane, Diana. Invisible Colleges. Chicago: University of Chicago Press, 1972.
6. Lawson, J., et al. A Bibliometric Study on a New Subject Field; Energy Analy.sis. Scientometrics 2( 1980):227-37.
7. Frame, J, Davidson, et al. An Information Approach to Examining Developments in an Energy Technology: Coal Gasification. Journal of the ASZS 30(July 1979):
193-201.
8. Crane, Zmrisible Colleges; Sullivan, Daniel. et al. The State of Science: Indicators in the Specialtyof Weak Interactions.Social StudiesojScience7(May 1977):167-200;
and Menard, Henry W.Science: Growth and Change. Cambridge, Mass.: Harvard University Press, 1971.

144

LIBRARY TRENDS

Law of Exgonential Growth


9. Braun, T., et al. An Analytical Look at Chemical Publications. Analytical
Chemistry 52(May 1980):617A-29A.
10. Bottle, R.T., and Rees, M.T. Liquid Crystal Literature. Journal of lnformation Science 1(May 1979):117-19.
11. Goffman, William. Mathematical Approach to the Spread of Scientific Ideasthe History of Mast Cell Research. Nature 212(29 Oct. 1966):449-52.
12.
, and Warren, Kenneth S. The Ecology of the Medical Literatures.
American Journal of the Medical Sciences 263(April 1972):267-73.
13.
. A Mathematical Model for Analyzing the Growth of a Scientific
Discipline. Journal of the ACM 18(1971):172-85.
14. Bennion, Bruce, and Neuton, Laurence. The Epidemiology of Research on
Anomalous Water. J o u m l of the ASIS 27(Jan.-Feb. 1976):53-56.
15. Small, Henry G. A Co-Citation Model of a Scientific Specialty. Social Studies
of Science 7(May 1977):139-66.
16. Gilbert, G.N. Measuring the Growth of Science: A Review of Indicators of
Scientific Growth. Scientometrics 1( 1978):9-34.
17. Moravcsik, Michael J. Measures of Scientific Growth. Research Policy 2(0ct.
1973):266-75.
18. May, Kenneth 0. Quantitative Growth of the Mathematical Literature.
Science 154(30 Dec. 1966):1672-73.
19. Oliver, F.R. Methods of Estimating the Logistic Growth Function. Applied
Statistics 13(1964):57-66.
20. Baker, Dale. Recent Trends i n the Growth of the Chemical Literature.
Chemical and Engineering News 54(1976):23-27.
21. Carpenter, M.P., and Narin. F. Thesubject Composition of the WorldsScientific Literature. Scientometrics 2( 1980):53-63.
22. Hulme, Edward W. Statistical Bibliography in Relation to the Growth of
Modern Civilization. London: Butler and Tanner, 1923.
23. Gottschalk, Charles M., and Desmond. Winifred F. Worldwide Census of
Scientific and Technical Serials. American Documentation 14(July 1963):188-94.
24. Bourne, Charles P. The Worlds Technical Journal Literature. American
Documentation lS(Apri1 1962):159-68.
25. May, Kenneth 0. Growth and Quality of the Mathematical Literature. ISIS
59(Winter 1968):363-71.
26. Ibid., p. 368.
27. Gosnell, Charles F. Obsolescence of Books in College Libraries. College &
Research Libraries 5(March 1944):115-25.
28. Moravcsik, Measures.
29. Ibid., p. 275.
80. Chubin, Daryl E., and Studer, K.E. Knowledge and Structures of Scientific
Growth. Scientometrics 1(1979):I85.
31. Chubin, Daryl E., and Moiua, Soumyo D. Content Analysis of References.
Social Studies of Science 5(Nov. 1975):423-41.
32. Price, Little Science, p.19.
33. Rescher, Scientific Progress.

SUMMER

1981

145

J. TAGUE, J. BEHESHTI

8C

L. REES-POTTER

Appendix

Statistics Used for Graphs in the Text

The counts upon which the figures are based are as follows:
Figure 1

146

Year

Chemical Abstracts

Year

Chemical Abstracts

1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943

11,847

15,169

15,459

17,545

21,682

23,194

26,630

25,115

18,981

16,108

15,945

13,881

15,240

19,326

20,451

24,098

25,315

26,643

27,097

30,238

33,491

39,135

1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
I964
1965

75,091
80,615
86,322
92,396
102,525
118,930
127,196
134,255
146,893
169,351
171,404
189,993
197,083

48,293

1966

220 3
0
3

55,146

52,728

59,461

66,153

61,570

63,413

64,572

64,735

66,928

67,108

53,680

50,494

45,646

43,669

1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979

242,527
232,508
252,320
276,674
308,976
334,426
321,005
333,642
392,234
390,905
410,137
428,342
436,887

43,700
33,672
39,578
39,288
43,996
53,441
59,098
63,033
70,147

LIBRARY TRENDS

Figure 2
Year

1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979

. Science Abstracts
21,410
21,160
24,240
26,000
31,OOO
34,000
38,000
40,790
50,480
49,610
79,830
84,340
85,180
81,350
83,370
87,630
74,180
91,670
96,580
101,240

N u m b e r of Abstracts
Biological Abstracts Chemical Abstracts

72,530
87,000
100,790
75,710
107,100
110,120
120,100
125,030
130,020
135,010
140,030
140,020
140,000
140,040
140,020
140,020
142,510
145,010

149,010
154,990

134,255
146,893
169,351
171,404
189,993
197,083
220,303
242,527
232,508
252,320
T76,674
308,976
334,426
321,005
333,642
392,234
390,905
410,137
428,342
436,887

Figure 3

Year

Historical
Abstracts

1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979

2,925
2,776
3,096
3,926
3,623
3,363
3,5 16
3,527
3,417
4,180
4,015
6,406
6,359
7,607
7,244
8,779
9,094
15,414
15,675
15,692

N u m b e r of Abstracts
International Political
Science Abstracts

1,461,000
1,510,000
1,415,000
1,355,000
1,467,000
1,471,000
1,492,000
1,574,000
1,450,000
1,693,000
2,206,000
2,244,000
2,998,000
4,555,000
4,955,000
5,015,000
5,039,000
5,040,000
5,075,000
5,105,000

Sociological
A bstructs

1,905
2,322
2,952
3,810
6,062
4,262
5,130
5,434
5,969
6,019
6,000
6,981
7,190
6,689
6,982

7,687
7,289
8,267
8,339
0

J . TACUE, J. BEHESHTI

&

L. REES-POTTER

Figures 4 and 5

Year

N u m b e r of Abstracts
Library and Information
Psychological
Science Abstracts
Abstracts
1,003
968
986
1,052
1,054
1,104
1,106
1,053
1,226
2,567
2,858
2,619
3,177
3,037
3,837
3,870
3,781
4,721
4,886
4,217

1960
1961
1962
1963
I964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979

8,532
7,353
7,700
8,381
10,500
16,619
13,622
17,202
19,586
18,068
21,722
23,000
17,976
24,409
25,558
25,542
24,687
27,004
26,292
29,714

Figure 6

148

Year

N o . of First Editions

1967
1968
1969
1970
1971
1972
1973
1974
1975
1976

79,289
78,875
87,604
95,433
97,469
103.679
112,300
110,715

LIBRARY TRENDS

Law of Exponential Growth


Figure 7

Year
1944

1959

1960

1961

1963

1965

1968

1969

1970

1971

1972

1973

1974

1975

1976

1977

1978

1979

1980

SUMMER

1981

Number of

Innovative Papers

Total
1

149

This Page Intentionally Left Blank

Teaching Bibliometrics
ALVIN M. SCHRADER

BIBLIOMETRICS,
THE SCIENTIFIC STUDY of recorded discourse, offers much
promise for enhancing university curricula in the informational
domain. This promise involves two dimensions of empirical knowledge, a theoretical dimension and a practical dimension, and so ought
to interest not only researchers and educators but professional practitioners as well. This promise issues from the special nature of empirical
knowledge, by which ideas about the world can be related to practical
activity. The special nature of such knowledge is derived from what
might be called a metatheory about the logic of inquiry. This metatheory is outlined below.
Bibliometrics taken as theoretical knowledge is the quantitative
characterization of the properties of recorded discourse. Quantitative
characterization is the setting forth of probabilistically true ideas about
selected phenomena. These ideas express patterns, tendencies and regularities that are said to be inherent in the phenomena. Such ideas,
because they describe general qualities, form empirical theory or just
theory. Maccia (now Steiner) and Maccia put it this way: Understanding should lead to explanation, because understanding provides
relationships or regularities which make sense of our happenings. To
explain is to appeal to regularities, i.e., to appeal to theory.2 Thus, the
objective of bibliometrics as a scientific study is to produce ideas-that
is, theory-about recorded discourse and its various important
properties.
Alvin M. Schrader is a doctoral candidate, School of Library and Information Science,
Indiana University, Bloomington.
SUMMER

1981

151

ALVIN SCHRADER

In addition, bibliometrics is considered to have promise in the


realm of practical knowledge, because theory permits control. More is
involved, however, than simply theory. A developmental bridge is
required by which theoretical knowledge is related to both the means
and the ends of the proposed practice. It is not only the effectiveness of a
practice that must be considered, but also its intrinsic merit, for a
practice is a system of human acts devised to bring about an intended
condition, and so involves values. This linking process from theory to
practice is described as development inquiry, operations research, or
systems analysis, though the latter two terms have generally connoted a
much narrower perspective of means-oriented research only.
If bibliometrics as seen in the context of metatheory has theoretical
and practical dimensions, that it can contribute both to our intellectual
understanding and to the control of professional activity, then it is
plausible that bibliometrics contains elements of a scientific discipline,
or, at least, for undergirding such a discipline within the domain of
informational phenomena and problems. But if bibliometrics has so
much promise, where is the spark that will inspire curiosity and consensus about this domain, and launch the needed programs of empirical
inquiry?
The missing ingredient is the collective imagination and commitment of our community of educators and researchers. True, under the
disciplinary umbrella of information and library science, one can identify a small (and growing) constituency of enthusiasts who take as
self-evident the power of quantitative research to enhance thinking
about informational phenomena and problems. Unfortunately, however, most members of this amorphous scholarly community have proceeded through graduate school and on to professional practice and
teaching and research without even seeing the term bibliometrics in
print. They still speak of universal bibliographic control as though i t
were a meaningful concept, and do not accept the notion that recorded
discourse consists of a set of many overlapping literatures, each of which
exhibits a statistical structure.
This unsatisfactory condition is exacerbated by library school doctoral programs which, with few exceptions, are still very weakly committed to quantitative research in general-and even more weakly
committed to bibliometrics in particular. There are many impediments
within the graduate library schools to the attainment of scholarly excellence in mainstream academia. These impediments add up to an inventory of neglect and intellectual confusion. Among the most relevant to
teaching bibliometrics are present library school curricula, research
methods textbooks, and the professional literature.
152

LIBRARY TRENDS

Teaching Bib liometrics


With respect to curricula, only a few library schools offer a bibliometrics course, and almost always on an ad hoc basis; some individual
faculty have inserted isolated components into traditional courses. The
directory of the Association of American Library Schools for 1980 did
not list bibliometrics in its classification of teaching areas. This is an
important indication of scholarly attitudes toward it.
A second illustration of impediments to bibliometrics concerns
research methods textbooks. In the one most recently published for
graduate library school students, Busha and Harter4 devote only onehalf page to bibliometrics, while other methodologies receive much
greater priority: five pages for content analysis, a 20-page chapter for
operations research, and a 30-page chapter for historical method. Such a
long discussion of historical method, enigmatic in the context of graduate education for information professionals in the 1980s world of scientific advance and managerial accountability, reflects persistence of the
old library school ideology, an ideology of 100 percent bookcollections,
scholar-librarians, parochial history essays, and white gloves.
Another impediment to bibliometrics in library schools concerns
the professional literature and its bibliographic control. The Journal of
Education for Librurianship, for example, has published over the past
twenty years something less than a handful of articles which employed a
bibliometric analysis, and none at all which investigated a bibliometric
methodology and its assumptions. Another similar indicator of the
absence of interest among educators and researchers in bibliometrics is
the fact that only one comprehensive review article, by Narin and Moll:
has appeared in the Annual Review of Information Science and Technology since its inception in 1966-despite their confident prediction in
that review that future issues would treat bibliometrics in greaterdepth.
No general reviews at all have appeared in Advances in Librarianship
since it began in 1970, though for the record i t should be noted that i t did
publish a review of one type of bibliometric application to library
collection building, by Broadus.
With respect to bibliographic control of the literature of bibliometrics, Ferrante has indicated that fifty-two synonymous and semisynonymous search descriptors were required to retrieve the relevant
publications during the period from 1969 (when Pritchard first introduced the term bibliometrics in place of statistical bibliography7) until
1977. She noted that: While Library and Information Science Abstracts
and Library Literature both picked u p the term bibliometrics by 1971,
Information Science A bstructs vacillated until 1973....Neither ERICnor
L.C. Subject Headings include the term among their subject
headings. ...&
SUMMER

1981

153

ALVIN SCHRADER

These illustrations of impediments to the introduction of bibliometrics into graduate library school curricula can be placed in the larger
perspective of major weaknesses in the knowledge baseof educators and
researchers. The major weaknesses are seen to be their atheoretical
approach to problem-solving and their elementary descriptive
approach to quantification.
The atheoretical approach to problem-solving is illustrated pointedly by the semantic confusion in the literature between theory and
philosophy, in that pleas for a philosophy of library scienceare taken to
be pleas for theory, and the terms are used interchangeably. Philosophy,
however, is value theory and is sorted out in logic and epistemology
from empirical theory, so that ideas about what ought to be and what
ought to be done are differentiated from ideas about what exists in the
world. Value theory is not a substitute for empirical theory, but rather,
as has been demonstrated already, is a necessary complement in development inquiry which links theory to practice. In any event, pleas for a
philosophy of library science have usually boiled down to weak
attempts to rationalize the genteel empiricism in which educators and
researchers have functioned since the 1870s.
A second major weakness concerns educators and researchers traditionally elementary approach to quantification. The charge is frequently made that librarians are hostile to numeracy and quantitative
research, but this charge seems inadequate as a description of practitioners attitudes toward quantitative expression. In fact, numbers as quantifiers of library activity and library services are not merely
simple-mindedly avoided or despised, but on the contrary are universally employed to describe such variables as library holdings, book
circulation and salaries. The problem is not professional hostility, fear,
anxiety, or other psychoanalytic peculiarities brought by students to
graduate library schools. The problem is that educators and researchers
have left the professional community innumerate and deficient in dealing adequately with quantification. How can graduates go beyond
elementary description of data if they have not been educated todo so?
How are they to learn that mere data collection is not the complete act of
research if their educators teach that i t is? How are they to come to an
understanding of what Cole and Eales meant in 1917 by a statistical
analysis of a literature? Or what Hulme meant in 1923 by statistical
bibliography of scientific literature for documenting the history of
science? Or what Lotka meant in 1926 by the logarithmic frequency
distribution of scientists productivity to the progress of science as
indicated by publications? Or what Bradford meant in 1934 by the
154

LIBRARY TRENDS

Teaching Bib liometrics


law of distribution of papers on a given subject in scientific periodicals? Or what Gosnell13meant in 1944 by treating book collections as
populations with averages and general trends, one of which was that
book obsolescence rates correspond to an exponential curve?
The quantitative literature-though
sparse-has always been
there. Library school educators and researchers have not. Presumably,
security of institutionalization in university graduate departments has
lulled them into complacency with the status quo. However, it is
altogether probable that the intellectual confusion which has resulted
from this complacency will not satisfy the academic demands posed by
an information-consuming world. If the informational community
eventually attains a higher-order social role, its emergence from atheoretical empiricism and innumeracy may well turn out to emulate the
history of the medical profession, described succinctly by Thomas:
For century after century, all the way into the remote millennia of its
origins, medicine got along by sheer guesswork and the crudest sort of
empiricism. It is hard to conceive of a less scientific enterpriseamong
human endeavors. Virtually anything that could be thought up for
the treatment of disease was tried out atone time or another, and, once
tried, lasted decades or even centuries before being given up. It was, in
retrospect, the most frivolous and irresponsible kindof human experimentation, based on nothing but trial and error, and usually resulting in precisely that sequence. Bleeding, purging, cupping, the
administration of infusions of every known plant, solutions of every
known metal, every conceivable diet including total fasting, most of
these based on the weirdest imaginings about the cause of disease,
concocted out of nothing but thin air-this was the heritage of medicine up until a little over a century ago. It i s astounding that the
profession survived so long, and got away with so much with solittle
outcry.14

A rationale for moving bibliometrics into the mainstream of graduate library school curricula has been set forth based on the logic of
inquiry. Indeed, bibliometric knowledge ought to be integrated into
existing courses and, at the same time, specialized programs ought to be
offered at both the MLS and Ph.D. levels for advanced study of both
theory and methodology. There is a growing body of researchers and
educators who are utilizing and extending bibliometrics, and some
scholarly community will no doubt lay claim to this domain in the near
future. If that scholarly community is not the library schools as presently constituted, then there are other plausible claimants, including
(but not limited to) academic programs of information science, sociology of knowledge, computer science, public policy, education, and
history and philosophy of science. Indeed, the pioneering advances in
SUMMER

1981

155

ALVIN XHRADER

relevant theory have so far come from scholars outside the library
schools, scholars such as Merton in the sociology of science, Kuhn in the
history of science, and Price in the history of science and medicine.
If none of the foregoing arguments for teaching bibliometrics has
been convincing, the only remaining appeal is to an observation attributed by Pritchard to Fairthorne: Numerical data may or may not be
dull, but they are the only alternative to thumping the tableandaffirming ones intuitions.15
Proposal for an MLS Course in Bibliometrics

The proposal for a course in bibliometrics set forth here is notably


tentative and pertains to the MLS level; doctoral work in bibliometrics
should focus on theory construction and testing, and on advancing the
methodology and statistical techniques. The only previous discussion
in the literature of teaching bibliometrics was by Aiyepeku, but he did
not furnish an exemplar syllabus, which is the intention of this article.
Proposed course objectives are: (1) to teach students the basic principles of bibliometrics as related to scholarly literature; (2) to work
toward the construction of adequate theory of bibliometrics; and (3) to
review the practical applications of bibliometric methods for information retrieval systems. The emphasis of the course will be on the theoretical aspects of bibliometrics within the framework of compatible
research traditions such as epistemology, sociology of knowledge, scientific communication theories, and history and philosophy of science.
Students will familiarize themselves with the seminal papers and landmark literature of bibliometrics; examine major problem areas for
definitions, key assumptions, methodological procedures, and statistical distributions; and formulate theoretical statements.
No course prerequisities are assumed, but much of the substance of
bibliometrics involves the logic of inquiry and techniques of quantification; hence math anxiety should be avoided. Since standard parametric statistics are generally not utilized in describing and evaluating
bibliometric distributions, there is n o reason to require advanced familiarity with them; an understanding of nonparametric statistical tests
(e.g., Siegel) and lognormal distributions (e.g., Prate*) would be very
helpful, but unrealistic to require of MLS students. At the doctoral level,
however, learning these nontraditional statistical procedures and distributions should be a major priority, so that a core of numerate
researchers can be developed for advancing the theory and methodology
of bibliometrics.
156

LIBRARY TRENDS

Teaching Bib liometrics


A suggested range of student assignments for the MLS course
O~~OWS.'~

. A citation analysis of a library and information science journal with


respect to core journals, journal-to-journal citation, core of authors,
journal scatter, or subject dispersion.
'. Using the Sweaneym interpretation of Bradford's law, plotting two
sets of data and calculating possible estimates for the parameters of
journal variables, articles per zone, and multiplier. Alternate projects
are plots for L o t h ' s law2' or for Pratt's measure of class
concentration."
A bibliographical analysis of the literature of one of the following
subjects: referencing theories; typologies of citations; citation errors;
bibliographic coupliiig; cocitation analysis; author collaboration;
corporate authorship; author institutional affiliation; author discipline affiliation; obsolescence of literature; and referencing in nonscientific literatures.
1.

dinimum expectations in papers would include the provision of a


heoretical framework, definition of terms, explication of assumptions,
Ind a review of related research. Of course, it is anticipated that this
ssue of Library Trends will also stimulate a variety of ideas that could
become the focus of student assignments.

L Syllabus for Teaching Bibliometrics


The appendix to this paper suggests tentative content and
bmphases for an MLS course, together with (currently) desirable readngs. It is noted that few (if any) students will have the time to read
sverything listed, and so the onus is on the professor to map out a
nanageable program based on local institutional objectives and prioriies. Introductory remarks are presented for each major segment of the
x-oposed course in an attempt to identify progress and problems todate.
The remarks might furnish a starting point for lectures, or they might
)e revised and distributed to students for reference.
The major course segments given in the syllabus are: (1) overview of
he field, one unit; (2) theoretical framework, two units; (3) research
raditions: laws and models, five units; (4) research traditions: empirical
Lescriptions, five units; and ( 5 ) applications for professional practice,
wo units; for a total of fifteen units.

UMMER

1981

157

ALVIN SCHRADER

Future Prospects for Teaching Bibliometrics


The literature of bibliometrics is a rapidly growing one. In 1977
V O O Sestimated
~
there were 1400-2400publications on the subject from
the nineteenth century to date. Pritchard published a 700-item interim
bibliography on bibliometrics for the period 1881-1969,and announced
in 1979 that he is compiling a far more extensive one of 3000-4000 items
as a byproduct of a research degree. Hjerppe has published a bibliography of bibliometrics and citation indexing and analysis.% This work
indicates the growth of the literature and the international activity in
the field. It also suggests the need by any professor teaching bibliometrics to keep abreast of new research and to be prepared to discard any of
the above suggested readings as advances in theory and methodology are
made.
In evaluating the literature of bibliometricsand in helping to shape
future directions of bibliometrir research, educators and researchers are
encouraged to emphasize the following problem areas: (1 ) theoretical
formulations to link social communication processes and cognitive
structures in a field to its literature; (2) research into information
exchange patterns, multiple and overlapping channels, and information demands; (3) citation behavior and citing theory; and (4)research
into the properties of varying fields within science and social science,
and between them and nonscience. Finally, it is suggested that less
priority be placed on mathematical modeling with limited variables,
and instead that more emphasis be directed to underlying multivariate
conceptual dimensions in order to construct a more adequate theory of
bibliometrics in the context of information transfer processes and
26
systems.

References
I. See Steiner, Elizabeth D. Logical and Concefitual Analytic Techniques for
Educational Researchers. Washington. D.C.: University Press of America, 1978; and
. Notes o n Methodology of Educational Theory Construction. Bloomington: Indiana University, 1981. Mimeographed.
2. Maccia, Elizabeth S., and Marcia, George S . Use of SIGGS Theory Model to
Characterize Educational Systems asSocial Systerns.In Man in Systems, edited by Milton
Rubin, p. 170. New York: Gordon and Breach, 1971.
3. Wert, Lucille M., ed. Directory Issue--1980. Journal of Education for
Librarianship, vol. 20, 1980.
4. Busha, Charles H., and Harter, Stephen P. Research Methods in Librarianship;
Techniques and Interpretation. New York: Academic Press, 1980.
5. Narin, Francis, and Moll, Joy K. Bibliometrics. Annual Review of Inforrnation Science and Technology 12(1977):35-38.

158

LIBRARY TRENDS

Teaching Bibliometrics
6. Broadus, Robert N. The Applications of Citation Analyses to Library Collection Building. Advances in Librarianship 7( 1977):299-335.
7. Pritchard, Alan. Statistical Bibliography or Bibliometrics? Journal of
Documentation 25(Dec. 1969):348-49.
8. Ferrante, Barbara K. Bibliomeuics: Access in the Library Literature. Collection Management 2(Fall 1978):199.
9. Cole, F.J., and Eales, N.B. The History of Comparative Anatomy. Part 1: A
Statistical Analysis of the Literature. Science Progress 1 l(Apri1 1917):578-96.
10. Hulme. Edward W. Statistical Bibliography in Relation to the Growth of
Modern Civilization. London: Grafton, 1923.
11. Lotka, Alfred J. The Frequency Distribution of Scientific Productivity.
Journal of the Washington Academy of Sciences 16(June 1926):317-23.
12. Bradford, Samuel C. Sources of Information on SpecificSubjects.Engineering
137(26 Jan. 1934):85-86.
13. Gosnell, Charles F. Obsolescence of Books in College Libraries. College LY
Research Libraries 4(March 1944):115-25.
14. Thomas, Lewis. The Medusa and the Snail; More Notes of a Biology Watcher.
New York: Bantam, 1974, p. 133.
15. Pritchard, Alan. Statistical Bibliography: An Interim Bibliography. London:
horth-Western Polytechnic School of Librarianship, 1969, p. 1.
16. Aiyepeku, Wilson 0. Bibliometrics in Information Science Curricula. The
Information Scientist 9(March 1975):29-34.
17. Siege], Sidney. Noncarametric Statistics for the Behavioral Sciences. New York:
MrGraw-Hill, 1956.
18. Pratt, Allan D. The Analysis of Library Statistics. Library Quarterly
45(1975):275-86.
19. Many of the suggestedprojectsare from the list of assignments for Dr.L. Housers
bibliometrics course, University of Toronto, spring 1981.
20. Sweaney, Wilma P. An Empirical Test of the Incompatibility of the Two
Formulations of Bradfords Law. MLS research report, Faculty of Library Science,
University of Toronto, 1978.
21. Lotka, Frequency Disuibution.
22. Pratt, Allan D. A Measure of Class Concentration in Bibliometrics. Journal of
the ASIS 28(Sept. 1977):285-92.
23. Voos, Henry G. Bibliometrics and Management of Libraries. Proceedings of
the ASIS Annual Meeting 14(1977):fiche9-E4-9-E6.
24. Pritchard, Slatistica 1 Bibliography; and
. Announcement in Radials
Bulletin, no. 2 (1979), p. 149.
25. Hjerppe, Roland. A Bibliography of Bibliometrics and Citation Indexing and
Analysis. Stockholm: Royal Institute of Technology Library, Dec. 1980.
26. The author wishes to thank Prof. L. Houser of the University of Toronto and
Prof. A. Pratt of the University of Arizona, Tucson (formerly of Indiana University) for
stimulating and supporting my intrrest in bibliometrics.

SUMMER

1981

159

ALVIN SCHRADER

Appendix
BIBLIOMETRICS COURSE SYLLABUSX
1. Overview of the Field (1 unit)
This unit focuses on terminology, major concepts and reviews of the
literature.
Uncertainty about a variety of variables and their interconnections with
respect to scientific literatures was the impetus for bibliometric study. Some of
the initial questions were: Does the literature of a field represent the field? How
does the growth of a literature relate to the growth of scientific knowledge? What
are the essential characteristics constituting the structure of a literature? How d o
various literatures compare with respect to structure?Whoare the producers of a
literature? Who are its users? How are quantityand qualityof literature production related? These and later, more complex questions have attracted the attention of increasing numbers of researchers and theoreticians in a wide spectrum
of academic disciplines. Among current difficult problems are: the functions of
referencing (intellectual property recognition, persuasion or window dressing);
the relationship between the cognitive structure of a discipline and its social
structure, particularly as manifested in communication and publishing patterns; and the theoretical validity of bibliometrics i n scholarly nonscientific
fields.
The rapidly advancing status of bibliometrics as a scholarly specialty is
indicated by its large body of literature, now well over 2000 publications, by the
recent appearance of at least three journals, and by the attendant review literature. Particularly exciting is the international makeup of the research front,
comprising social scientists not only in the United States but also Russia,
Europe and England. Although bibliometric study began with the literatures of
the natural and biological sciences, social science literatures have also been
examined bibliomeuically from time to time. In addition, there have been a
handful of attempts to apply the various techniques to some of the literaturesof
the humanities disciplines.
Although there does not appear to be a consensus in the literature on the use
of the term bibliornetrics, the various other descriptions represent subspecialty
thrusts. Recently, for example, Narin (1976)introduced the concept of evaluative bibliometrics, which he defined as the quantitative measurement of the
properties of a literature in order to evaluate scholarly activity in a field. I n
addition, there is the term scientometrics, the scientific analysis of science and
science policy. The latter focus was embodied in the formation i n late 1978 of
Scientometrics; An International Journal for all Quantitative Aspects of the
Science ofScience and Science Policy. This is the second of three recent, relevant
journals. T h e first was Social Studies of Science; An International Reuiew of
Research in the Social Dimensions of Science and Technology (earlier entitled
Science Studies, from its inception in 1971 until the end of 1974). The third
journal, although of very recent origin, shows promising relevance. It isentitled
*A reference to an author during discussion of a unit has been footnoted only if the
reference does not appear in the accompanying list of readings.

160

LIBRARY TRENDS

Teaching Bibliometrics
Knowledge: Creatton, Diffusion, Utilization,and is aimed at bringing together
researchers, policy-makers, research and development managers, and other
practitioners engaged in the process of knowledge development. Of course,
there are also a number of journals relevant to bibliometrics within the history
and philosophy of science in terms of theoretical implications, notably the
British Journal for the History of Science. Another important indicator of
bibliometrir advance was the inauguration in 1975 of the Society for Social
Studies of Science, colloquially known as the 4S, which was reported to have
attracted over 500 members by the end of its first year.
A comprehensive review of the literature of bibliometrics was published by
Narin and Moll (1977),and a survey of developments to date by Hjerppe. In
addition, more than thirty doctoral dissertations and several monographs on
various aspects of bibliometrics have been published; among the notable monographs are those by Price (1963, 1975), Narin (1976), Elkana (1978), Garfield
(1979),and Garvey (1979).(Twoothermonographs haveattempted to presentan
integrative overview of bibliometrics, Donohue and Nicholasand Ritchie? but
neither has proven ~atisfactory.~
The definitive text awaits an author.)
Narin (1976) has mapped out three research fronts in the literature of
bibliometrics (see table 1 ) . They are: ( 1 ) the size of the scholarly enterprise; (2) the
properties (i.e., structure) of the literature of eachenterprise; and (3)the productivity of scholarly authors.
Size of scholarly enterprise is generally expressed in terms of national or
international comparisons among literatures. Recently, attempts have been
made to correlate scientific productivity of a given country as indicated by its
scientific literature with national economic-vitality. Such an index may become
particularly meaningful to the evaluation of progress in underdeveloped and
middle-power nations.
The structure of a literature is generally expressed in terms of relationships
among individual publications or among a set of publications such as journal
literature, in terms of links between researchers,or in termsof mapsofdisciplinary phenomena. These relationships and links and maps can be used toidentify
key events, advances and patterns of scholarly research. Newer work such as
cocitation analysis and multidimensional scaling can be used for evaluative
functions as well as description, in comparing productivity among authors,
journals or organizational entities such as funding agencies, university departments, professional associations, or countries. Suggested readings for this unit
follow.
Terminology:
Ferrante, Barbara K. Bibliometrics: Access in the Library Literature. Collection Management 2(Fall 1978):lW-204.
Garfield, Eugene. Scientometrics Comes of Age. Current Contents: Life
Sciences 1( 12 Nov. 1979):5-10.
Pritchard, Alan. Statistical Bibliography or Bibliometrics? Journal of Documentation 25(Dec. 1969):348-49.
Wittig, Glenn R. Statistical Bibliography-A Historical Footnote. Iournal of
Documentation 3(Sept. 1978):240-41.

SUMMER

1981

161

ALVIN SCHRADER

TABLE 1
CHRONOLOGY
OF MAJOR CONTRIBUTORS
TO THE DEVELOPMENT
OF
BIBLIOMETRIC
ANALYSES
OF SCIENTIFIC LITERATURES
Size of the
Literature

Structure of the
Literature

Productivity

1910
Cole and Eales

1920
Hulme

Lotka

Gross and Gross

1930
Bradford
Wilson and Fred

Cason and Lubotsky

1940
1950

Gosnell
(Bradford)

Fussler
Daniel and Louttit

1960
Kessler
Price
Bourne
Gottschalk and Desmond
Xhighnesse and Osgood
Barr
Price
Narin and Carpenter

~~~

(Zipf)
Lehman
Garfield
Schocklev
Westbrook

Price

Cole and Cole


Garfield
Narin, Carpenter and Berlt
Carpenter and Narin
Small and Griffith
Cox, Hamelman and
Wilcox

Source: Narm (1976),adapted and slightly expanded.

Reviews of the literature:


Narin, Francis. In Evaluative Bibliometrics: T h e Use of Publication and Citation Analysis in the Evaluation of Scientific Activity ( N T T S #PB 252 339).
Cherry Hill, N.J.: Computer Horizons, Inc., 1976, pp. 1-81.
, and Moll, Joy K. Bibliometrics. Annual Review of Information Science and TechnoZogy 12(1977):35-58.
Texts:
Elkana, Y., et al., eds. Toward a Metric of Science: T h e Advent of Science
Indicators. New York: John Wiley, 1978.
Garfield, Eugene. Citation Indexing-Zts Theory and Application in Science,
Technology, and Humanities. New York: John Wiley, 1979.
Gamey, William D. Communication: T h e Essence of Science; Facilitating
Information Exchange among Librarians, Scientists, Engineers, and Students. Toronto: Pergamon Press, 1979.

162

LIBRARY TRENDS

Teaching Bibliometrics
Holzner, Burkhart, and Marx, John H. Knowledge Application; The Knowledge System in Society. Boston: Allyn and Bacon, 1979.
Merton, Robert K. The Sociology of Science; Theoretical and Empirical Inuestigations. Chicago: University of Chicago Press, 1978.
Price, Derek de Solla. Little Science, Big Science. New York Columbia University Press, 1963.
. Science Since Babylon. 2d ed. New Haven, Conn.: Yale University Press, 1975.
2. Theoretical Framework (2 units)
These units focus primarily on exogenous theory from the sociology of
science and from the history and philosophy of science. Recently, some promising indigenous contributions from information science have been published.
One of these is Pritchard (1972), who attempted to relate bibliometrics to the
information transfer process, conceptualizing the flow of information through
channels as analogous to a chemical or industrial process. Another is Meincke
and Atherton (1976),who have introduced the difficult but interesting concept
of knowledge space or scientific space, in which concepts, fields of knowledge,
and information items in a retrieval system are likened to physical objects (such
as atoms) that occupy multidimensional vector space.
However, while theoretical advances in the sociology of science have been
spectacular, little progress has occurred in our understanding of the nature of
theoretical properties of the vast array of subject literatures. Forexample, P e r i d
has argued, convincingly, that citation analysis cannot properly be applied to
historical research because citations representing the source documents for
history cannot be sorted out from citations representing ordinar references.
This may well have been the difficulty in the analysis by Bracel of citation
patterns in graduate library school doctoral dissertations, a large proportion of
which have always been historical research. The same validity problem arises
with respect to citation analysis of literary criticism studies.
Theoretical uncertainty goes deeper than this, however, for what we really
need to understand better is under what conditions a literature structure maybe
said to be isomorphic to the referencing behavior and norms of its producers.
Scientific literature is assumed to be isomorphic, or more nearly isomorphic, to
the referencing behavior of scientific authors because scientists produce knowledge by building on previous knowledge, and so they acknowledge the antecedent work, the intellectual property, of their colleagues. Thus, both the scientific
advances and the citing may be regarded as cumulative. Garfield, Malin and
Small (1978) suggest that citation linkages in science reflect both the cognitive
structure and the social structure of a specialty; thisargument has not yet been
adequately elaborated for empirical testing, however.
Like this theoretical hypothesis, there are many other challenges awaiting
bibliometric inquiry. Some of these are to produce adequate explanations of the
followingproblems and phenomena: how progress in scientific knowledge can
be objectively identified, and how such progress is reflected in the literature;
how the social systems of science and nonscientific scholarship differ, and how
they reflect differing communication patterns, differing referencing practices
and norms, and differing publication practices; how patterns of information
SUMMER

1981

163

ALVIN X H R A D E R

exchange activity are related to the processes of scientific research, discovery,


dissemination, and utilization by scientists, and how these processes vary from
discipline to discipline or perhaps even from specialty to specialty; how the
nature of a research front should be determined (is it in the formal or informal
communication domain, and if in the formal, is itmoreaccuratelydescribedasa
citation front, as Garvey (1979) has perceptively argued?); how the hardnesssoftness metaphor describing a continuum of scientific rigor can be either
operationalized and tested, or abandoned; how the identification of a susceptible in the epidemic theory of information diffusion proposed by Goffman and
Newill (1964) can be determined; how the nature of a citation can be defined (is
one citation to a paper equivalent tomultiple citations to the same paper?);how
the nature of a reference is to be agreed upon (is a reference to a scientific paper
the same as a reference in historical inquiry and in literary criticism?); how
information transfer or informaton flow are to be treated; what the relationship
is between information, knowledge, ideas, and data; and finally, how the
dissemination of knowledge differs between the paper disciplines and the product disciplines (that is, between scientific and technological research activities),
and between them and the secret disciplines of military and industrial inquiry.
These are only some of the exciting theoretical problems before us. Suggested
readings for this unit follow.
Readings:
Ben-David, Joseph. Emergence of National Traditions in the Sociology of
Science; The United States and Great Britain. In Sociology of Science;
Problems, Approaches, and Research, edited by Jerry Gaston, pp. 197-218.
Washington, D.C.: Jossey-Bass, 1978.
Cole, Jonathan R., and Zuckerman, Harriet. The Emergence of a Scientific
Specialty: The Self-Exemplifying Case of the Sociology of Science. In The
Idea of Social Stucture; Papers in Honor of Robert K . Merton, edited by
Lewis A. Coser, pp. 139-74. New York: Harcourt Brace Jovanovich, 1975.
Garfield, Eugene. Citation Indexes for Science; a New Dimension in Documentation through Association of Ideas. Science 122(15July 1955):108-11.
, et al. Citation Data as Science Indicators. In Toward a Metricoj
Science: The Advent of Science Indicators, edited by Y. Elkana, et al., pp.
179-207. New York: John Wiley, 1978.
Gilbert, G. Nigel. The Transformation of Research Findings into Scientific
Knowledge. Social Studies of Science 6(1976):281-306.
. Measuring the Growth of Science; A Review of Indicators of
Scientific Growth. Sclentometrzcs I( 1978):9-34.
, and Woolgar, Steve. The Quantitative Srudy of Science: An
Examination of the Literature. Science Studies 4(July 1974):279-94.
Goffman, William, and Newill, V.A. Generalisation of Epidemic Theory: An
Application to the Transmission of Ideas. Nature 204(0ct. 1964):225-28.
Heyl, John D. Paradigms in Social Science. Society 12(July-Aug. 1975):61-67.
Kuhn, Thomas S. The Structure of Scientific Revolutions. 2d ed. Chicago:
University of Chicago Press, 1970.
Lakatos, Imre, and Musgrave, Alan. Criticism and the Growth of Knowledge.
Cambridge: University Press, 1970.
Laudan, Larry. Progress and Its Problems: Toward a Theory of Scientific
Growth. Berkeley: University of California Press, 1978.

164

LIBRARY TRENDS

Teaching Bib Eiomet rics


Meincke, Peter P.M., and Atherton, Pauline. Knowledge Space: A Conceptual
Basis for the Organization of Knowledge. Journal of the ASIS 27(Jan.-Feb.
1976):18-24.
Merton, Robert K. Priorities in Scientific Discovery. Reprinted in T h e
Sociology of Science; Theoretical and Empirical Investigations. Chicago:
University of Chicago Press, 1973, pp. 286-324.
. The Matthew Effect in Science. In TheSociology ofscience, pp.
439-59.
Popper, Karl R. Conjectures and Refutations: T h e Growth of Scientific Knowledge. New York: Harper, 1963.
. Objective Knowledge; An Evolutionary Approach. London:
Oxford University Press, 1972.
Price, Derek de Solla. The Revolution in Mapping of Science.Proceedings of
the ASIS Annual Meeting 16(1979):249-53.
Pritchard, Alan. Bibliometrics and Information Transfer. Research in
Librarianship 4( 1972):37-46.
Rescher, Nicholas. Scientific Progress; A Philosophical Essay on the Economics
of Research in Natural Science. Pittsburgh: University of Pittsburgh Press,
1978.
3. Research Traditions: Laws and Models (5 units)

This section is prefaced by an introduction to logarithmicdistributionsand


nonparametric statistical procedures. This is necessary because bibliometric
data have been found to exhibit geometric or exponential properties of growth
and decline, rather than arithmetic properties.
From the bibliometrics literature, there is a strong impression that two
research traditions have developed, more or less independently though concurrently. The one tradition is characterized by investigation into distributional
properties, typically culminating in the formulation of a statistical law or a
mathematical model of the logarithmic variety. This tradition derives from
Lotka, Bradford and Zipf, and is represented by such researchers as Bookstein,
Brookes, Coile, Fairthorne, Goffman, Kendall, Leimkuhler, ONeill, Pratt,
Vickery, Vlach?, and Wilkinson.
The other research tradition is more strictly empirical, focusing on counts
of data and on first-order relationships among sets of data such as cocitation
mapping describes. Notable contributorsin this tradition are Fussler, Garfield,
Griffith, Kessler, Line, Mullins, Narin, Price, Sandison, and Small. In passing,
i t should be noted that the creation of Science Citation Index, Social Sciences
Citation Index and Arts 6. Humanities Citation Index by the Institute for
Scientific Information in Philadelphia have vastly accelerated the potential
advance of knowledge through the empirical tradition.
Bibliometric measures in general focus less on the central tendency of a
distribution of data and much more on the extremes which characterize the
distribution. Also, bibliometric measures are based on the frequency ranking of
data, in most cases. However, if the essential information in the data is to be
preserved and evaluated, nonparametric statistical tests for rank-ordered data
cannot be utilized because such tests do not adequately preserve the magnitude
of differences between rankings. Other nonparametric approaches must be
SUMMER

1981

165

ALVIN SCHRADER

devised, so that the typical high concentration of data in a relatively small


proportion of the population can be represented.
There is still a great deal of investigation required into the underlying
theoretical dimensions of the mathematical formulations expressed in Lotkas
law, Bradfords law and Zipfs law. Various explanations to date have proposed
a law of diminishing returns model, a cumulative or comparative advantage
model issuing from the more generalized theory of stochastic processes, and a n
information theoretic model of the human mind. However, as Bookstein (1979)
noted in a recent critique of the current views, these various models and laws all
turn out to be mathematically identical, and this in itself is an interesting
finding that invites investigation.
There is also a great deal of investigation required into methodological
validity. Chile (1977)has documented several misuses of Lotkas law, for example, and Wilkinson (1972) has pointed out that n o two researchers have interpreted Bradfords law in the same way. Some of the current questions are:
whether these distributions are properly described as laws at all rather than
simply probabilistic occurrences; whether Bradfords law is reliable for small
collections, what small means, and whether a collection can be one journal or
whether a broad base of journals is required; whether Bradfords law is biased
toward journals that publish a large number of very short papers; whether
sample size is a factor in making comparisons of scattering characteristicsacross
fields; whether Bradfords law can be explained as an artifact of journal editorial
policy, as Fairthorne (1969)has speculated; and whether the performance of new
journals, papers and authors can be predicted. Related issues are whether the
investigation of one or two variables without a research hypothesis, as is the case
with the empirical descriptions discovered by Bradford, Lotka and Zipf, constitutes a n adequate basis for quantitative inquiry, and whether multivariate
bibliometric analyses would be more fruitful. Suggested readings for this unit
follow.
Logarithms:
Aitchison, J., and Brown, J.A.C. The Lognormal Distribution. Cambridge:
University Press, 1957.
Pratt, Allan D. The Analysis of Library Statistics. Library Quarterly
45(1975):275-86.
Bradford and Zipf:
Bradford, Samuel C. Sources of Information on Specific Subjects.
Engineering 137(26 Jan. 1934):85-86.
. The Documentary Chaos. In Docurnentation, pp. 106-21.
London: Crosby Lockwood, 1948.
Brookes, Bertram C. The Complete Bradford-Zipf Bibliograph. Journal of
Documentation 25(March 1969):58-60.
. Theory of the Bradford Law. Journal of Documentation
33(Sept. 1977):180-209.
Hubert, John J. A Relationship between Two Forms of Bradfords Law.
Journal of the ASZS 29(May 1978):159-61.
Praunlich, Peter, and Kroll, M. Bradfords Distribution: A New Formulation.
journal of the ASZS 29(March 1978):51-55.

166

LIBRARY TRENDS

Teaching Bibliometrics
Sweaney, Wilma P. An Empirical Test of the Incompatibility of the T w o Formulations of Bradfords Law (MLS research report, Faculty of Library
Science). Toronto: University of Toronto, 1978.
Vickery, B.C. Bradfords Law of Scattering. Journal of Documentation
4( 1948):198.
Wilkinson, E.A. The Ambiguity of Bradfords Law. Journal of Documentation 28(June 1972):122-30, 232 (erratum).
Lotka:
Allison, Paul D., et al. Lotkas Law: A Problem in Its Interpretation and Application. Social Studies of Science 6(1976):269-76.
Coile, Russell C. Lotkas Frequency Distribution of Scientific Productivity.
Journal of the ASIS 28(Nov. 1977):366-70.
Lotka, Alfred J. The Frequency Distributon of Scientific Productivity.
Journal of the Washington Academy of Sciences 16(19 June 1926):317-23.
Vlachjr, Jan. Frequency Distributions of Scientific Performance; A Bibliography of Lotkas Law and Related Phenomena. Scientornetrics
1( 1978):1O9-30.
Recent advances:
Bookstein, Abraham. Explanations of the Bibliometric Laws. Collection
Management 3(Summer-Fall 1979):151-62.
Fairthorne, Robert A. Empirical Hyperbolic Distributions (Bradford-ZipfMandelbrot) for Bibliometric Description and Prediction. Journal of
Documentation 25(Dec. 1969):s19-43.
Garfield, Eugene. Bradfords Law and Related Statistical Patterns. Current
Contents: Life Sciences 2( 12 May 1980):5-12.
Pratt, Allan D. A Measure of Class Concentration in Bibliometrics. Journalof
the ASZS 28(Sept. 1977):285-92.
Price, Derek de Solla. A General Theory of Bibliometric and Other Cumulative
Advantage Processes. Journal of the ASIS 27(Sept.-Oct. 1976):292-306.
. Cumulative Advantage Urn Games Explained: A Reply to
Kantor. Journal of the ASIS 29(July 1978):204-06.
Shaw, W.M. Entropy, Information and Communication. Proceedings of the
ASZS Annual Meeting 16(1979):32-40.
4. Research Traditions: Empirical Descriptions (5 units)

This section covers publication counting and citation analysis. Simple


one-toAone citation links and the notion of bibliographic coupling were typical
empirical approaches in the 1960s and before, but in the following decade the
concept of cocitation clustering was invented and came to dominate the bibliometrics research front. The cocitation clustering technique has exciting potential for mapping the structure of scientific specialties and perhaps even entire
fields of science, and for documenting changes and growth over time. Studies
into the validity and limitations of citation analysis are also reviewed; contributions here are content analysis and typologies of citations, sometimes referred to
as context analysis, and correlational analysis of citations with other quantitative and qualitative measures.

SUMMER

1981

167

ALVIN SCHRADER

Scholarly norms of citing are complex and vary from field to field and from
science to nonscience. Similarities in citing conventions between scientific
literatures and humanities literatures are not adequately understood at all, but
the social conventions determining citing behavior in a given field are crucial to
theoretically valid characterizations of the structure of the fields literature.
The citing of antecedent research is a strong social norm among scientists
and social scientists. Citation relationships are conceptualized as semantic
relations between texts that constitute directed lines connecting later to earlier
work. When these relations are graphed, they are said (borrowing from graph
theory) to form a digraph. Such a digraph reflects semantic textual structures
such that anteredent subject matter is linked to later subject matter. Citation
analysis relies on the occurrence of the social norms of citing, but there are many
other reasons for particular choices of prior authors and papers. As Lipetz (1965)
and Weinstock (1974), among others, have noted, these choices could be motivated by any of the following: paying homage to pioneers; providing background reading; giving an example; modifying, correcting, criticizing, or
refuting previous work; identifying the original publication of an eponymic
concept or term such as Paretos law; or window dressing. Refinements in
citation analysis methodology are now being produced through contextual
analysis of references. Also, studies have been undertaken in science toassess the
correlation between citation data and peer judgments. Cole and Cole (1973)and
Zuckerman (1977), among others, have demonstrated that straight citation
counts are highly correlated with virtually every refined measure of research
quality and other forms of scientific recognition, such as the Nobel prize and
membership in a national academy of science.
Thus, although errors or deviations in citing behavior do occur, the
accumulation of bibliographic links over hundreds or even thousands of actsof
citing over time is seen to map out thecognitivedomain of scientific knowledge
in a given area; the self-correcting and cumulating nature of knowledge is a
probabilistic process that sloughs off the errors or deviations and dead-end
research programs. In effect, when anauthor cites he is classifying hisown work
with respect to the perceived domain of all prior scholarship.
What lends further credence to the validity of citation analysis, at least in
science, is the consensus factor; that is, the journal-refereeing system requires a
consensus among selected scholars on the worth of the work being submitted for
publication, and one of the criteria for judging such worth is coherence with
past research, presumably as represented by the researcherschoice of citations to
antecedent work. However, it should also be noted that citation anomalies
having a small effect on the average might have serious distorting effects in a
particular instance, for example, anomalies such as obliteration, eponyms and
highly unpopular claims like those of Arthur Jensen.
Thus, citing theory is in its infancy. Among the factors influencing the
nature and frequency of citation are the following: the size of the field and
number of authors in a field; the nature of the field, especially its degree of
theoretical integration or codification; whether a field is a paper- or productproducer, and especially what proportion of a field may be said to be engaged in
secret research, such as for military and industrial organizations; the age of a
field; differing growth rates of fields; journal editorial policies, such as rates of
publication, language of publication, length of articles; journal function (e.g.,

168

LIBRARY TRENDS

Teaching Bz b 1iometrics
reporting research or current awareness); journal quality and prestige; author
eminence; average number of references per journal article; the degree of anomalous citation behavior i n a field; perceived social utility of the field and funding
for research; rates of multiple versus single citation to a paper; rates of multiple
versus single authorship; variability in quality and importance of papers;
relationships between obsolescence and changes in journal size; and above all,
differentialreference functions and norms among the sciences, social sciences,
technological fields, and the nonsciences. Suggested readings for this unit
follow.
Citation analysis:
Cawkell. A.E. Understanding Science by Analysing Its Literature. The
Znformation Scientist lO(March 1976):3-10.
Cole, J.R., and Cole, S. Social Stratification in Science. Chicago: University
of Chicago Press, 1973.
Garfield, Eugene. The Obliteration Phenomenon in Science-and the
Advantage of Being Obliterated! Current Contents:Lifesciences 18(22Dec.
1975):5-7.
. Citation Analysis and the Anti-Vivisection Controversy.
Current Contents:Lije Sciences 20(25 April 1977):5-10;and Citation Analysis and the Anti-Vivisection Controversy. Part 11. An Assessment of Lester R.
Aronsons Citation Record. Current Contents: Life Sciences 20(28 Nov.
1977):5-14.
. Restating the Fundamental Assumptions of Citation Analysis.
Current Contents: Life Sciences 20(26 Sept. 1977):5-6.
. High Impact Science and the Case of Arthur Jensen. Current
Contents: LifeSciences 21(9 Oct. 1978):5-15.
. Is Citation Analysis a Legitimate Evaluation Tool? Scientometrics 1(1979):359-75.
Gilbert, G. Nigel. Referencing as Persuasion. Social Studzes of Science
7(Feb. 1977):113-22.
Griffith, Belver C., et al. On the Use of Citations in Studying Scientific
Achievements and Communication. Society for Social Studies of Science
Newsletter 2 (Summer 1977):9-13.
Kaplan, Norman. The Norms of Citation Behavior: Prolegomena to the Footnote. American Documentation 16(July 1965):179-84.
Line, Maurice B., and Sandison, Alexander. Obsolescenceand Changes in
the Use of Literature with Time. Journal of Documentation 30(Sept.
1974):283-350.
Porter, Alan L. Citation Analysis: Queries and Caveats. Social Studies of
Science 7( 1977):257-67.
Price, Derek de Solla. The Citation Cycle. In North American Networking,
(collected papers, ASIS 8th mid-year meeting, Banff, May 1979), edited by
A.B. Piternick. Washington, D.C.: ASIS, 1979.
Small, Henry G. Co-citation in the Scientific Literature: A New Measure of
the Relationship between T w o Documents. Journal of the ASZS 24( JulyAug. 1973):265-69.
. Cited Documents as Concept Symbols. SocialStudies ojSczence
B(Aug. 1978):327-40.

SUMMER

1981

169

ALVIN SCHRADER

Vms, Henry G., and Dagaev, Katherine. Are All Citations Equal? Or, Did We

O p . Cit. Yourldem? Journal ofAcademicLibrarianship l(Jan. 1976):19-21.


Zuckerman, Harriet. Scientific Elite. New York: Free Press, 1977.
Context analysis:
Bertram, Shelia J.K. The Relationship Between Inua-Document Citation
Location and Citation Level. Ph.D. diss., University of Illinois at UrbanaChampaign, 1970.
Chubin, Daryl E., and Moitra, Soumyo D. Content Analysis of References:
Adjunct or Alternative to Citation Counting? Social Studies of Science
5(1975):423-41.
Lipetz, Ben-Ami. Improvement of the Selectivityof Citation Indexes to Science
Literature through Inclusion of Citation Relationship Indicators.
American Documentation 16(1965):81-90.
Moravcsik, Michael J., and Murugesan, P. Some Results on the Function and
Quality of Citations. Social Studies of Science 5(1975):86-92.
Murugesan, P., and Moravcsik, Michael J. Variation of the Nature of Citation
Measures with Journals and Scientific Specialties. Journal of the ASZS
29(May 1978):141-47.
Small, Henry G. &-citation Content Analysis: The Relationship between
Bibliomeuic Structure and Knowledge. Proceedings of the ASZS Annual
Meeting 16(1979):276-85.
Spiegel-Rosing, h a . Science Studies: Bibliometric and Content Analysis.
Social Studies of Science 7( 1977):97-113.
Weinstock, Melvin. ISIs Social Sciences and Humanities Citation Index. In
Access to the Literature of the Social Sciences and Humanities. New York:
Queens College Press, 1974.
5. Applications for Professional Practice (2 units)

There is a great deal of controversy about the appropriateness of bibliometric applications to practical problems. Some authors have argued that underlying theoretical explanations of the bibliometric distributions are too weak to
guide information facility policy decisions, that bibliometric theory is not ready
for practical application. Others have urged even greater application, particularly to library collection management. Several reviews have been published,
notably those of Broadus (1977), Buckland (1978). Fitzgibbons (1980), and
Lancaster (1977). Moll edited a special issue in 1978 of Collection Management
devoted to bibliometrics in library collectlbn management.
However, a number of major application problems have not been adequately addressed in the bibliometrics literature. First, most of the mathematical
models which have been proposed are static models, i.e., they assume fixed
economic conditions, for example, with respect to journal acquisitions costs
versus interlibrary loan costs, fixed subject areas, fixed user interests and homogeneous information demands, and fixed information facility objectives and
policies. Second, the models are simplistic and do not adequately reflect reality
in that they assume-but are unable to demonstrate operationally-that user
satisfaction can be defined and measured, and that individual user dissatisfaction is unimportant to the advance of scholarship. Third, the mathematical

170

LIBRARY TRENDS

Teaching Bibliometrics
models have weak explanatory power. They are unable, for example, to predict
the performance of new journals, new researchers and new papers. Fourth, the
variables in the models are only vaguely linked to sociological concepts. For
example, citation analysis treats the formal communication process, while use
and user studies concern demands on an information facility. Are identical or
highly dissimilar processes and modes of social communication behavior thus
being measured? How valid is the assumption that citations reflect information
facility use patterns? Fifth, almost all information facility objectives and, in
particular, collection policies are so unclearly expressed that they boil down to
assertions that cannot be operationalized and tested. Fundamental concepts
such as information need, user satisfaction, and even information facility use,
are inadequately articulated. Until information facilities begin to support
development inquiry on a grand scale, with funds for researchers rather than for
computers and computer applications, progress in applying bibliometric theory will be very slow. Finally, almost all the models and bibliometric explanations to date have been focused on scientific journal literatures, scientific
information facilities, and scientific researchers. More work is needed to determine what form practical applications should take in public and academic
libraries as they are presently constituted, with amorphous, heterogeneous user
populations exhibiting highly diversified demand patterns.
These are some of the difficult but challenging problems ahead. Suggested
readings for this unit follow.
Reviews of the literature:
Broadus, Robert N. The Applications of Citation Analyses to Library Collection Building. Aduances in Librarianship 7(1977):2!%-335.
Buckland, Michael K. Ten Years Progress in Quantitative Research on
Libraries. Socio-Economic Planning Sciences 12(1978):333-39.
Fitzgibbons, Shirley A. Citation Analysis in the Social Sciences. In Collection Development in Libraries: A Treatise, edited by George B. Miller and
Robert D. Stueart, pp. 291-344. Greenwich, Conn.: JAI Press, 1980.
Lancaster, F. Wilfrid. The Measurement and Evaluation of Library Services.
Washington, D.C.: Information Resources Press, 1977, pp. 327-67.
Moll, Joy K., ed. Special Issue on Bibliometrics. Collection Management,
vol. 2, Fall 1978.
Readings:
Allen, Edward S. Periodicals for Mathematicians. Science 70(20 Dec.
1929):592-94.
Baughman, James C. Towards a Structural Approach to Collection Development. College & Research Libraries 38(May 1977):241-48.
Bourne, C.P. Some User Requirements Stated Quantitatively in Terms of the
90 Percent Library. In Electronic Information Handling, edited by A. Kent
and O.E. Taulbee, pp. 93-110. Washington, D.C.: Spartan Books, 1965.
Drott, M. Carl, et al. Bradfords Law and Libraries: Present ApplicationsPotential Promise. Aslib Proceedings 31(June 1979): 296-304.
Garfield,Eugene. Citation Analysis as a Tool in Journal Evaluation. Science
178(NOV.1972):471-79.

SUMMER

1981

171

ALVIN SCHRADER

. No-Growth Libraries and Citation Analysis; or, Pulling Weeds


with ISIs Journal Citation Refiorts. Current Contents: Life Sciences
18(30 June 1975):5-8.
Goffman, William, and Morris, T.G. Bradfords Law and Library Acquisitions. Nature 226(6 June 1970):922-23.
Gosnell, Charles F. Obsolescence of Books in College Libraries. College &
Research Libraries 4(March 1944):115-25.
Gross, P.L.K., and Gross, E.M. College Libraries and Chemical Education.
Science 66(28 Oct. 1927):385-89.
Line, Maurice B. Rank Lists Based on Citations and Library Uses as Indicators
of Journal Usage in Individual Libraries. Collection Management
2(Win ter 1978):313-16.
, and Sandison, Alexander. Practical Interpretation of Citation
and Library Use Studies. College 6 Research Libraries 36(Sept. 1975):39396.
Pritchard, Alan. Citation Analysis vs. Use Data. Journal of Documentation
36(Sept. 1980):268-69.
Raisig, L. Miles. Statistical Bibliography in the Health Sciences. Bulletin
of the Medical Library Association 50(July 1962):450-61.
Subramanyam, K. Criteria for Journal Selection. Special Libraries 66(Aug.
1975):367-71.
Trueswell, Richard. Some Behavioral Patterns of Library Users: The 80/20
Rule. Wilson Library Bulletin 43(1969):459, 461.
Turner, Stephen J. Trueswells Weeding Technique: The Facts. College 6
Research Libraries 41(March 1980):134-40.
Voos, Henry G. Bibliometrics and Management of Libraries. Proceedings of
the ASZS Annual Meeting 14(1977):fiche9-E4-9-E6.

Notes
1. Hjerppe, Roland. An Outline of Bibliometrics and Citation Analysis.
Stockholm: Royal Institute of Technology, 1978.
2. Donohue, Joseph C. Understanding Scientific Literature: A Bibliometric
Approach. Cambridge, Mass.: M I T Press, 1973.
3. Nicholas, David, and Ritchie, Maureen. Literature and Bibliometrics. London:
Clive Bingley, 1978.
4. For reviews of Donohues monograph, see: American Libraries 5(July-Aug.
1974):368; Brookes, Bertram C. Nature 249(May 1974):496-97; Dikeman, R.K. American
Reference Books Annual 6(1975):138-39; Lancaster, F. Wilfrid. Newsletter on Library
Research, no. 11 (March 1974),pp. 7-11; Narin, Francis, and Voos, Henry. Journal of the
ASZS 26(March-April 1975):129; Rcsenberg, Betty. Znformation Storage and Retrieval
1O(Dec. 1974):420-21;Swisher, Robert. R Q 14(Fall 1974):75-76;Vaillancourt, Pauline M.
Library Journal 99(Sept. 1974):2045; and Wilkinson, Elizabeth. Journal of Documentation 30(Dec. 1974):438. For reviews of Nicholas and Ritchies monograph, see: Culnan,
Mary J. Znformation Processing and Management 15(1979):170;and Morrison, Perry D.
College C Research Libraries 39(Sept. 1978):414-15.
5. Periu, B. Cheila. Research in Library Science as Reflected in the Core
Journals of h e Profession: A Quantitative Analysis (1950-1975). Ph.D. diss., Florida
State University, 1978.
6. Brace, William. A Citation Analysis of Doctoral Dissertations in Library and
Information Science, 1961-1970. Ph.D. diss.. Case Western Reserve University, 1975.

172

LIBRARY TRENDS

This Page Intentionally Left Blank

Partial List of Library Trends Issues in Print*


Title

V.

N.

II
II
I1

II

Editor

Darr
00.

1962
1962

Jan.

1963

April

I963

W i d r e d C. Ladlry
Haiold Lancour
J. Clemoir Harrison
Margaret Knox Goggm

July

1963

Oct

Jan.

1963
1964

Robert \'orper

April

1964

Guy Garrison

July
Oct.
Jan.
April

1964
1964
1965
196.5

H.C. Campbell
Charlrr I.. Trinknpr
Katharinr G . Harris
Eugene B. Jackson
Andrew Geddes

July
Orl.

1965

Jan.
April

1966

Government Publicationr
2 Collrnion Dwelopm~ntm Ilniverrity Librarirs
3 Bibhography: Current State and
Future Trmds. Part I
4 Bibliography: Current Stat? and
Future Trends. Part 2

T h o m a S. Shaw
Jerrold Ornr
Robert B. Downs
Francs B Jenkins
Robert 0. Downs
Franrrr 0. J m h n r

Jul,
Ort.

1966

Jan.

1967

April

1967

Esther J. P i m q
Rotxrt L. Ialmadgr
c.Walter Stonr
Foster E. Mohrhardr

I Library Boards
2 Bibliothwipy
3 l a w Libraria
4 Financial Admmirtrauon of Libraries

J . A r c k r Eggrn
Ruth M. l-ewr
Bcmita J. Dawn
Ralph H Parker
Paxton P. Price

July

~~

\'.

I2
12

2 Education lor Librarianship Abroad


in Srlectcd Cauntrrer
3 Current Trends In Refewnc~Services
4 European Univrrrity Libraries- Current
Status and Devclopmrntr

12
I?

V.

13

N. I R w a r r h Methods In Librarianship

I3
13

2 Late and Local H ~ S I O in


N Librarie
3 Rrgional Public Library Systems
4 Library Furniiurc and Furnishings

I3
V

I Publtc~LibraryService to Chrldrrn

14

I4
I4
14

I Metropolitan Public Library Problmnr


Around the World
2 Junior Collegr Lihrari9 Library Srrvicr to lndustn

t
t
V.

N.

15
15
15

15

N.

16

19

19

1968

1968

Dwvrlopmmt in National Daumcntation and


Inlormation Smicer
t The Changing Nature of the S<h m l Library

H.C. Campbell
Mar Graham

Jam.

1969

April

1969

H Vat1 Dralr
David C. U'eber
Rolland E Stevens

July
OCI
Jan.

1969

Hmry J. D o k t e r

April

1970

Evrrprr T.. Moore

July

Alex Ladrnsan
Mary B. Cassaca

0'1.

1970
1970

Jan.

1971

Philip Lewis

April

1971

Elizabmh W. Stone

July

1971

Helm H. Lynran

Orr.
Jan.
April

1971
1972
1972

Gordon Strv~nson
Felix E. Hirrrh
Elranor P h i n n q
F Willrid Lancasrrr

July

1972
1972
1975

H.R. Simon

July

1973

Alrcr L o h m
Sarah R e d

Ort.
Jan.

1973
1974

George S. Bonn

April 1974

in College Librarianship

I Inirllrrtual Freedom
2 Statc and Federal Legislation for Librarm
3 Book Storage
1 New Dimensions in Educational 'I?< hnology
lor Multi-Media Centers

In Libraries
2 Library Programs and SPw~resto thr
Didrantagrd
3 The Influence of Amrriran Librarianship Abroad
I Current Trends in llrban Main Librarirs

20

ZO
20
~

April
July
Oa.

N. 1 Personnel Development and Continuing Educaiton

ZO

117

1967
196H

Sara K. Srygle)

2 Ilntvcrwy Library Buildmgs


3 Prohlrmr of Acquisition for Rnearch L.lbarim
4 l r r u e and P r o b l m r in n w g n i n g a National
Program ol Libraiy Automation

19
19

Jul,
Or[

Jan

1966

Grace T. S t o m s o n
Audrry B i d

N. I T r m L

18
18
18

C r d K. Byrd

Larry Earl Bone

1962)

1969

1970

~~

21

21
21
21

V.

1966

Group Snvicer in Public Libraries


2 Young Adult Service in the Public Library

18

Gmprrativr and Cmlralized Calaloging

N. I

17
17
17

17
I'

2 Ltbrary Ikrr of the New Media ofCommunication


3 Ahstraning Sercrrpr
1 School Library Servicrs and Adminiriration
at thr School Dirtrat Level

16
16
16

IS5

~~

V.

\'.

Current Trends m Branch Libraries

~~

Clvde Walton
Ilannis S Smith
Frarer G. Pmlc

22
22
22
22

N.

lrmds in Archival and Rrlvrrmrr


G,llmtons of Rmorded Sound
2 Standards lor Librarrn
3 Library Service to the Aglng
4 &,terns Design and Analysis lor Libraries
I Analym of Bibhographim
2 Rmcarch in the Fields of Reading
and Communication
3 Evaluation ol Library Srrvires
4 Srirnm Materials for Children
and Y w n g People

Oct.

Jan.
April

1973

You might also like