Citation Analysis
Citation Analysis
Citation Analysis
PRODUCTION NOTE
University of Illinois at
Urbana-Champaign Library
Library Trends
VOLUME 30 NUMBER 1
SUMMER 1981
University of Illinois
Bi bliometrics
CONTENTS
Charles H. Davis
3 FOREWORD
Daniel 0. OConnor
Henry Voos
Ronald E. Wyllys
INTRODUCTION
John J. Hubert
65
Linda C. Smith
83 CITATION ANALYSIS
D. Kaye Gapen
Sigrid P. Milner
107 OBSOLESCENCE
Jean Tague
Jamshid Beheshti
Lorna Rees-Potter
Alvin M. Schrader
Foreword
SUMMER
1981
Introduction
WILLIAM GRAY POTTER
BIBLIOMETRICS
IS, simply put, the study and measurement of the publication patterns of all forms of written communication and their
authors. Though the word is of recent coinage, the practice goes back at
least to the 1920s.
There has been a great increase in the number of publications in
bibliometrics over the past two decades. This increase has not been
accompanied by critical analyses of the field and of the direction of
bibliometrics in general. The purpose of this issue of Library Trends is
to provide analyses of the major concepts of bibliometrics and to indicate its present and future directions. An effort has been made to make
the articles in this issue understandable to persons new to the topic
without depriving those readers already initiated into the mysteries of
bibliometrics of new insights and a measure of controversy. The authors
of these articles are knowledgeable in their topics, but, with a few
exceptions, are not usually associated with bibliometrics. These authors
were chosen to bring some new names and, it is hoped, new ideas to the
literature.
In a general introduction to bibliometrics, Daniel OConnor and
Henry Voos argue that because bibliometrics has largely been used only
to describe bibliographic phenomena, and is not yet able to explain or
predict these phenomena, i t is merely a method, not a theory. They state
that if bibliometrics is to attain the status ofa theory, to beable to predict
and explain, and, thus, to become more useful, researchers must concentrate on the causal factors underlying bibliographic phenomena.
William Gray Potter is Acquisitions Librarian, University of Illinois at lirbanaChampaign.
SUMMER
1981
WILLIAM POTTER
The next four articles deal with the three major laws of
bibliometrics-Lotkas law, Bradfords law, and Zipfs law-and with
attempts to unify these individual laws under one general distribution.
William Potter provides a bibliographic history of Lotkas law and its
application. M. Carl Drott examines Bradfordslaw and concludes that
more work is needed in exploring the underlying causes behind Bradfords observations. Ronald E. Wyllys provides a discussion of the
origins of Zipfs law, with some interesting observations on the character and context of Zipf himself. John J. Hubert examines efforts to join
the laws of Lotka, Bradford and Zipf into one unified, general model.
While he finds these attempts statistically sound, Hubert faults them for
being too simple, usually with only one dependent variable, and points
to research that attempts to account for more variables and which may
provide more accurate, predictive and useful models.
Citation analysis is perhaps the most written-about topic in bibliometrics. Linda C. Smith provides an extensive review of the literature
and discusses the practical applications of citation analysis.
The rate at which literature becomes obsolete is of interest to both
the information scientist studying the evolution of disciplines and to
practicing librarians concerned with collection management. D. Kaye
Gapen and Sigrid P. Milner have prepared a detailed review of research
in obsolescence.
There has been exponential growth in the number of publications
and it is widely believed that knowledge is also growing, though not at
the same rate as publications. Jean Tague, Jamshid Beheshti and Lorna
Rees-Potter discuss the relationship between the growth of literature
and the growth of knowledge.
Throughout the articles in this issue, there is a recurring theme
which, in essence, says that the traditional bibliometric models and
distributions are too simple to reflect reality accurately. To be useful,
bibliometrics must be able to explain and predict phenomena, not just
to describe them. To do this, more complex models are needed. The
problem is that bibliometrics is already thought too difficult and out of
the reach of most librarians and information scientists. One possible
solution is to incorporate bibliometrics into library and information
science curricula. Alvin M. Schrader discusses how a course on bibliometrics might be taught and provides a sample syllabus.
In addition to the contributors, I would like to credit the following
people for their contributions to this issue: Charles Davis for his encouragement and guidance; Michael Gorman, Bernard Hurley, Rebecca
Lenzini, Daniel OConnor, and Charlene Renner for their editorial
LIBRARY TRENDS
Introduction
advice and assistance;Wendy Darre and Lisa Olson for their willingness
to type and retype seemingly endless tables and bibliographies; and,
finally, to the editorial staff of Library Trends for their usual excellent
job.
References
1. Pritchard, Alan. Statistical Bibliography or Bibliometrics? Journal of
Documentation 24 (Dec. 1969):348-49.
2. Hulme, E. Wyndham. Stafzstzcal Bibliography in Relation to the Growth of
Modern Ctvrlization. London: 1923.
SUMMER
1981
BIBLIOMETRICS
HAS COMMANDED the attention of numerous individuals
in library and information science. The measurement of bibliographic
information offers the promise of providing a theory that will resolve
many practical problems. It is claimed that patterns of author productivity, literature growth rates and related statistical distributions can be
used to evaluate authors, assess disciplines and manage collections. Yet,
it is unclear if bibliometrics is merely a method or if it meets the test of a
theory in its ability to explain and predict phenomena. This paper
examines the properties of bibliometric distributions in a nontechnical
manner.
Twelve years ago, Pritchard coined the term bzblzornetrzcs and
defined i t as the application of mathematics and statistical methods to
books and other media of communication. Its purpose was:
1. To shed light on the processes of written communication and of
the nature and course of development of a discipline (in so far as this is
displayed through written communication), by means of counting
and analyzing the various facets of written communication ...;
2. The assembling and interpretation of statistics relating to books
and periodicals ...to demonstrate historical movements, to determine
the national or universal research of books and journals, and to
ascertain in many local situations the general use of books and
journals2
1981
DANIEL OCONNOR
&
HENRY
voos
LIBRARY TRENDS
1981
11
DANIEL OCQNNOR
&
HENRY
voos
12
LIBRARY TRENDS
1981
13
DANIEL OCONNOR
8c
HENRY
voos
during the past five years. It might be hypothesized that these external
events have influenced the rate of author productivity in librarianship
over the past decade.
The Bradford distribution (or Law of Scatter) groups journals and
articles to identify the number of periodicals relevant to a particular
subject. Its computation is based on the total number of articles published by the journals in a particular subject area. A constant is then
computed for that subject area, which is used to determine the percentage of total coverage by various numbers of journals in a field. One
formula for this is:
R(n) = N log n/s (1 In 5 N)
where
R(n) = total number of journal articles
N = total number of journals
s = a constant (specific to a subject area).27
For example, Brookes applies this formula to a scientific literature
which yielded a total of 2000 articles from 400 journals. The results
indicate that 40 percent of the articles are contained in 5 percent of the
journals. Further, 80 percent of the articles arecontained in 37 percent of
the journals.28A core of journals is thus identified which could be used
to select the essential journals for a special collection.
Originally, Bradford had studied articles and journals to improve
abstracting services. He was concerned about the statistical distribution
he identified, and Fairthorne reports on this: Though in public and,
rather ambiguously, in private Bradford tended to belittle this finding,
he did make use of it. His private conversations gave me the impression
that he was sure ...that he had not enough evidence or explanation to
sustain i t in public debate.% Others have since affirmed that there is
enough evidence to support Bradfords statistical distribution and to
link it to a general bibliometric distribution.m Brookes cites numerous
uses of a Bradford bibliograph: items borrowed from a library, users
ranked by number of items they borrow, number of items cited (using a
nonrestrictive Bradford-Zipf distribution), and the index terms assigned
to document^.^^ These uses of a Bradford distribution have value for
library decision-making, since the distribution allows for the prediction
of regularity in a variety of events. Knowledge of sources and their items
(i,e., the Bradford formula) permits prediction of core collections, core
users and core index terms. However, explanation is lacking which
would give theoretical import to Bradfords statistical distribution.
14
LIBRARY TRENDS
1981
15
DANIEL OCONNOR
&
HENRY
voos
journals with
very recent references are considered to be at the research front as a hard
science. Those journals with references to more retrospective materials
are considered less hard, less scientific. For example, physics journals
contain the highest percentage of references to materials published in
the past five years (over 60 percent), while some English literature
journals only have 10 percent of their references dated in the past five
years.
Garfield developed a journals impact factor as the number of
citations a journal receives divided by the number of articles published
in a given time period.3s Narin developed influence weights as the total
number of citations to a journal divided by the total number of references from a journal (excluding self-reference and ~elf-citation).~
Although these measures are used to evaluate journals, they can also be
extended to evaluate authors by the number of citations individuals
receive. Meadows gives an account of the uses of such citations to assess
an authors reputation and importance.&
16
LIBRARY TRENDS
17
DANIEL OCONNOR
&
HENRY
voos
plines (e.g., the behaviors of the literatures associated with the humanities versus the literature of the social sciences versus the science^).^'
Much of this research has focused on the literatures of the scientific
disciplines.
Since independent variables are grouped into conceptual areas the
interrelationships of which become the theory, the unit of analysis is
critical to the generality of the results. It is unlikely that research results
would ever be generalized beyond the unit of analysis. It could prove
impossible to generalize a common theory from studies of individuals
and studies of journals. At best, two middle-range theories might be
developed which could suggest hypotheses for a single, third area of
investigation. This hope of a unified theory has plagued other professions, and it is doubtful that bibliometrics can surpass the barrier
created by multiple units of analysis. Instead, it might be more productive to split the ill-defined field of bibliometrics into separate components where the unit of analysis is consistent and results can be
generalized across studies.
The various bibliometric models proposed here will need to pay
close attention to the issue of external validity. The models need to be
more than explanatory (i.e., explaining a large proportion of the variability in the dependent measure); indeed, the models will have to prove
their worth by making actual predictions using new cases. This allows
for the importance (or weight) of each variable in the model to be tested
in a rigorous manner. It provides proof that the theory works with new
data in real situations. It also assures that hypothesized nonlinear
relationships among the independent variables do, in fact, contribute to
explaining the variability in the dependent measures.
Finally, bibliometrics has much to offer the library and information field. The work of the past-by Lotka, Bradford and Zipf-is
valuable in helping librarians assess patterns of authorship (for cataloging rule changes), identifying core collections (for collection management), and designing better retrieval systems (for authority control).
However, the continued emphasis on the similarities of the bibliometric
statistical distributions is not regarded here as a fruitful endeavor. The
long-term benefits of bibliometrics will begin to emerge when attention
is directed toward causal explanations of bibliographic phenomena. At
that point, bibliometrics will again offer practical benefits to libraries.
18
LIBRARY TRENDS
References
1. Pritchard, Alan. Statistical Bibliography or Bibliometrics? Journal of
Documentation 25(Dec. 1969):348.
2.
. Computers, Statistical Bibliography and Abstracting Services,
1968. (unpublished);and Raising, L.M. Statistical Bibliography in Health Sciences.
Bulletin of the Medical Library Association 5O(July 1962):450, 461. Cited an Pritchard,
Statistical Bibliography, p. 349.
3. Simon, Herbert R. Why Analyze Bibliographies? Library Trends
22(July 1973):3-8;and Nicholas, David, and Ritchie, Maureen. Literature and Bibliometrics. London: Clive Bingley, 1978.
4. Hjerppe, Roland. A Bibliography of Bibliometrics and Citation Indexing and
Analysis. Stockholm: Royal Institute of Technology Library, 1980.
5. Pritchard, Alan, and Wittig, Glen. Bibliometrics: A Bibliography and Index
(1874-1959).vol. 1, Watford, Eng.: ALLM Books. in press.
6. Fairthorne. Robert A. Empirical Hypberbolic Distributions (Bradford-ZipfMandelbrot) for Bibliometric Description and Prediction. Journal of Documentation
25(Dec. 1969):319-43;Price, Derek de Solla. A General Theory of Ribliomeuic and Other
Cumulative Advantage Processes. Journal of the ASIS 27(Sept.-Oct. 1976):292-306;and
Bookstein, Abraham. The Bibliometric Distributions. Library Quarterly 46(0ct.
1976):416-23.
7. Narin, Francis, and Moll, Joy K. Bibliometrics. In AnnualReuiew oflnformation Science and Technology, edited by Martha E. Williams, p. 45. Vol. 12. Washington,
D.C.: American Society for Information Science, 1977.
8. Fairthorne, Empirical Hypberbolic Distributions, p. 322.
9. Rapoport, Anatol. Rank-Size Relations. In International Encyclopedia of
Statistics, edited by William Kruskal and Judith Tanur, p. 851. New York: Free Press,
1978.
10. Ibid., pp. 847-54: Price, A General Theory; and Bookstein, Bibliometric
Distributions.
11. Fairthorne, Empirical Hyperbolic Distributions, p. 321.
12. Carnap, Rudolf. Philosophical Foundations of Physics. New York: Basic Books,
1966, p. 228.
13. Ibid., p. 230.
14. Fairthorne, Empirical Hyperbolic Distributions, p. 332.
15. Price, A General Theory.
16. R a p p o r t , Rank-Size Relations, p. 851.
17. Ibid., p. 853.
18. Hill, Bruce M. Zipfs Law and Prior Distributions for the Composition of a
Population. Journal of the American Statistical Association 65(Sept. 1970):1230.
19. Line, Maurice B., and Sandison, Alexander. Practical Interpretation of Citation
and Library Use Studies. College 6 Research Libraries 36(Sept. 1975):393-96;and Line,
Maurice B. Rank Lists Based on Citations and Library Uses as Indicators of Journal
Usage in Individual Libraries. Collection Management 2(Winter 1978):313-16.
20. Broadus, Robert N. The Applications of Citation Analyses to Library Collection Building. Advances in Librarianship, vol. 7, edited by Melvin J. Voigt and Michael
H. Harris, pp. 299-335, New York: Academic Press, 1977.
21. See Moll, Joy K. Bibliometrics in Library Collection Management: Preface to
the Special Issue on Bibliomeuics. Collection Management 2(Fall 1978):195-98.
22. Hill, Bruce M. The Rank-Frequency Form of Zipfs Law. Journal of the
American Statistical Association 69(Dec. 1974):1025.
23. Narin and Moll, Bibliometrics, p. 46.
SUMMER
1981
19
DANIEL OCONNOR
&
HENRY
voos
24. Merton. Robert K. The Sociology of Science: Theoretical and Empirical Investigations. Chicago: IJniversity of Chicago Press, 1973; and Zuckerman, Harriet. Scientific
Elite: Nobel Laureates in the United States. New York: Free Press, 1977.
25. Lindsey, Duncan. The Scientific Publication System in SocialScience. San Francisco: Jossey-Bass, 1978, p. 89.
26. Mischel, Walter. Toward a Cognitive Social Learning Reconceptualization of
Personality, Psychological Review 8O(July 1973):252-83.
27. Brookes, B.C. Numerical Methods of Bibliographic Analysis.Library Trends
22(July 1973):26.
28. Ibid., p. 27.
29. Fairthorne, Empirical Hyperbolic Distributions, p. 333.
30. Price, A General Theory; and Bookstein, Bibliomeuic Distributions.
31. Brookes, Numerical Methods of Bibliographic Analysis.
32. Garfield, Eugene. Citation Analysis as a Tool in Journal Evaluation. Science
178(3 Nov. 1972):476.
33. Swanson, Don R. Information Retrieval as a Trial-and-Error Process, Library
Quarterly 47(ApriI 1977):128-48.
34. Narin and Moll, Bibliometrics, p. 46.
35. Burton, Robert E., and Kebler, R.W. The Half-Life of Some Scientific and
Technical Literatures. American Documentation 1](Jan. 1960):18-22.
36. Brookes, Numerical Methods of Bibliographic Analysis, p. 34.
37. Price, Derek de Solla. Citation Measures of Hard Science, Soft Science,
Technology, and Nonscience. In Communication Among Scientists and Engineers,
edited by Carnot E. Nelson and Donald K. Pollock, pp. 3-22. Lexington, Mass.: Heath
Lexington Books, 1970.
38. Garfield, Citation Analysis, pp. 471 -79.
39. Narin, Francis. Evaluative Bibliometrics: The Use of Publication and Citation
Analysis in the Evaluation of Scientific Activity. Cherry Hill, N. J.: Computer Horizons,
1976. (PB 252 399)
40. Meadows, Arthur J. Communication in Science. London: Butterworths. 1974.
41. McGrath, William E. Circulation Studies and Collection Development. Collection Development in Libraries, edited by Robert D. Stueart and George B. Miller, Jr.,
pp. 373-403. Greenwich, Conn.: JAI Press, 1980.
42. Iindsey, Scientific Publication System; Prim, Citation Measures of Hard Science:
and Garvey, William D. Communication: The Essence of Science. Oxford: Pergamon
Press, 1979.
LIBRARY TRENDS
Introduction
THE
ORIGINAL STATEMENT of what has come to be known as Lotkas law
was made in Lotkas 1926journal article, The Frequency Distribution
of Scientific Productivity: ...the number (of authors) makingn contributions is about l/n2 of those making one; and the proportion of all
contributors, that make a single contribution, is about 60percent. To
derive his inverse square law, Lotka used comprehensive bibliographies in chemistry and physics and plotted the percentage of authors
making 1, 2, 3,...n contributions against the number of contributions
with both variables on a lo<garithmicscale. He then used the leastsquares method to calculate the slope of the line that best fit the plotted
data, and he found that the slope was approximately -2.
Since the publication of Lotkas original article in 1926, much
research has been done on author productivity in various subject fields.
The publications arising from this research have come to be associated
with Lotkas work and are often cited as proving or supporting his
findings. However, a review of this literature reveals that Lotkas article
was not cited until 1941, that his distribution was not termed Lotkas
law until 1949,and that noattempts were made to test the applicability
of Lotkas law to other disciplines until 1973. The present article will
discuss the literature that has become associated with Lotkas law and
will attempt to identify the important factors of Lotkas original methodology which should be considered when attempting to test the
applicability of Lotkas law.
William Gray Potter is Acquisitions Librarian, University of Illinois Library at lirbanaChampaign.
SUMMER
1981
21
WILLIAM POlTER
22
LIBRARY TRENDS
TABLE 1
LOTKA,
Chemical Abstracts DATA
PROPORTION
OF AUTHORS
NO.
Contributions
0bsewt-d
SdXi
Expected
FdX)
IFdXX)- Sdx) I
0.5792
0.1537
0.0715
0.0416
0.0267
0.0190
0.0164
0.0123
0.0093
0.0094
0.5792
0.7329
0.8044
0.8460
0.8727
0.8917
0.9081
0.9204
0.9297
0.9391
0.6079
0.1520
0.0675
0.0380
0.0243
0.0169
0.0124
0.0095
0.0075
0.0061
0.6079
0.7599
0.8274
0.8654
0.8897
0.9066
0.9190
0.9285
0.9360
0.9421
0.0287
0.0270
0.0230
0.0194
0.0170
0.0149
0.0109
0.0081
0.0063
0.0030
2
3
1981
23
WILLIAM POTTER
TABLE 2
LOTKA,AUERBACH
DATA
PROPORTION
OF AUTHORS
2
3
4
5
6
7
8
9
10
0.5917
0.1540
0.0958
0.0377
0.0249
0.021 1
0.0143
0.0143
0.0045
0.0053
0.5917
0.7457
0.8415
0.8792
0.9041
0.9252
0.9395
0.9538
0.9583
0.9636
0.6079
0.1520
0.0675
0.0380
0.0243
0.0169
0.0124
0.0095
0.0075
0.0061
0.6079
0.7599
0.8274
0.8654
0.8897
0.9066
0.9190
0.9285
0.9360
0.9421
0.0162
0.0142
0.0141
0.0138
0.0144
0.0186
0.0205
0.0253
0.0223
0.0215
= 0.0448
24
LIBRARY TRENDS
Lotkas L a w
Lotkas inverse square law of scientific productivity has since been
shown to fit data drawn from several widely varying time periods and
disciplines.
While some of these studies do not cite sources, those that do often cite
Derek de Solla Prices Little Science, Big Scien~e.~
Those that go
beyond Price cite Dresden, Dufrenoy, Davis, Williams, Zipf, Leavens,
and Simon.14 Several authors, following Prices lead, have assumed
Lotkas law to have been proved and have proceeded to discuss why the
distribution occurs, i.e., why some authors produce more or less than
others. These include later works by Price, Bookstein, Allison et al., and
Sh0ck1ey.l~These efforts to explain and refine Lotkas formulation are
interesting and valuable. In looking at the work of these authors,
however, it appears that some misunderstanding has developed, for, in
fact, most of the studies cited as demonstrating Lotkas law do not
mention Lotka and do not offer comparable data.
Dresden is the earliest author cited in relation to Lotkas law.16
Although Hubert refers to Dresdensarticle as subsequent to Lotkas
work, it did, in fact, appear in 1922. Dresden lists authors who presented papers at the regular meetings of the Chicago section of the
American Mathematical Society (AMS). While Dresden does mention
that 59 percent of the papers were later published, he is not concerned
with the publishing behavior of the authors involved. Hubert claims
that Dresden studied the output of American mathematicians. Actually, the authors studied were members of a regional section of AMS.
Dresdens purpose is to provide a record of the work of the Chicago
section of the AMS, not to make a generalization about the productivity
of mathematicians. To do so from Dresdens figures would be misleading, because the Chicago section of the AMS may not be representative
of all mathematicians, and because the figures apply to presented papers, not publications. Dresdens work is interesting, but its relation to
Lotkas law is questionable.
Dufrenoy attempted to study the publishing behavior of biologists
by anlayzing the index to the Review of Applied Mycology for 1932,
1934 and 1935, and papers published in volumes 115, 118 and 120 of
Comptes Rendus d e la Sociitk de Biologie (1932, 1934, 1935). He is
interested in the publishing behavior of biologists on an annual basis,
not in the rate of productivity over time as Lotka is. Dufrenoy does not
even cite Lotka, let alone attempt to apply Lotkas inverse square to his
data.
Davis in 194119is the first author to cite Lotka in the fifteen years
following Lotkas original article. He also used Dresdens data, thus
SUMMER
1981
25
WILLIAM POlTER
26
LIBRARY TRENDS
Lotkas L a w
TABLE 3
LEAVENS,
PAPERSPRESENTEDAT MEETINGS
OF THE
ECONOMETRICS
SOCIETY
OR I N Econometrica, 1933-52
No.
Contributions
1
2
No.
Contributors
Contributors
436
107
61
40
14
23
6
3
4
5
6
7
8
9
11
12
13
14
16
17
11
1
4
2
3
2
1
2
18
23
24
28
30
37
46
1
1
TOTAL
2
1
1
1
721
60.47
14.84
8.46
5.55
1.94
3.19
0.83
1.53
0.14
0.55
0.28
0.42
0.28
0.14
0.28
0.14
0.14
0.14
0.28
0.14
0.14
0.14
100.00
Total N o .
Contributions
436
214
183
160
70
138
42
88
9
44
24
39
28
16
34
18
23
24
56
30
37
46
1,759
1981
27
WILLIAM POTTER
TABLE 4
LEAVENS
PROPORTION
OF AUTHORS
No.
Contributions
IFdXj - S d X ) I
Expected
Observed
~
0.6047
0.1484
0.0846
0.0555
0.0194
0.0319
0.0083
8
9
0.0014
2
3
4
5
6
0.0153
0.6047
0.7531
0.8377
0.8932
0.9126
0.9445
0.9528
0.9681
0.9695
0.6079
0.1520
0.0675
0.0380
0.0243
0.0169
0.0124
0.0095
0.0075
n =721
D = Max IFo(X)- &,(XI =0.0396
At the 0.01 level of significanre, K-S statistic = 1 . 6 3 / f i
D < 0.0607
Therefore. Lotkas law holds for Leavens data.
0.0032
0.6079
0.7599
0.8274
0.8654
0.8897
0.9066
0.9190
0.9285
0.9360
0.0068
0.0103
0.0278
0.0229
0.0379
0.0338
0.0396
0.0335
= 0.0607
LIBRARY TRENDS
Lotkas Law
With the exception of Leavens, no new data fitting Lotkas law are
found in the above articles, and the figures from Leavens could be
suspect. Yet presumably these studies are the ones invoked as proof of
the applicability of Lotkas law by later authors, e.g., It has been
shown to hold for the productivity patterns of chemists, physicists,
mathematicians, and econometricians.29In point of fact, no published
article attempts to apply or test Lotkas law until Murphy in 1973. A
critique of Murphys article is provided by Chile and is described above;
Hubert also faults Murphy.30
After Murphy, the next published application of Lotkas law is
Voos in a 1974 study of information science. Taking his data from all
articles indexed in Information Science Abstracts for 1966-70, Voos
proposes that the inverse square law does not hold for information
science and that -3.5 is a better constant for this particular d i ~ c i p l i n e . ~ ~
The error Voos makes is pointed out by Coile in a subsequent letter to
the editor.32Voos lists the five years under study separately and then
simply adds the tabulations for the individual years to arrive at a total
for the five years: i.e., the number of authors publishing one paper in
1966, 1967, 1968, 1969,and 1970 were added together to arrive at a figure
for all authors publishing one paper. Thus, an author publishing one
paper per year would be credited with only one paper for the five years
and not five, as he should be. As Coile points out, Voos is studying
single years of data whereas Lotka studied a number of years. Like
Dufrenoy, Voos defines an important area for research in analyzing
author productivity on an annual basis.
Schorr has published three articles dealing with Lotkas law in
library science, history of legal
and map librarianship. The
faults of the last article are documented by Coile as described earlier.
The first article is similarly flawed because, as Tudor points out in a
Schorr uses only two journals, College
subsequent letter to the
c
h Research Libraries and Library Quarterly, for 1963-72. Schorr concludes that the data on the history of legal medicine do not fit Lotkas
law. Tudor terms Schorrs article a frivolous bagatelle, but it did
reawaken interest in Lotka. However, the choice of such a restricted
subject field consvasts sharply with Lotkas use of the topics of physics
and chemistry.
Rogge attempts to apply Lotkas law to the literature of anthropology. He cites Lotka and claims that Lotkas law has been tested
Using the 40-year cumulative index of the
positively many
American Anthropologist (1888- 1928)and the 30-yearcumulative index
of American Antiquity (1935-65), Rogge concludes that it was clear
SUMMER 1981
29
WILLIAM POTTER
30
LIBRARY TRENDS
Lotkas Law
analyzing the results of the previous studies, however, it was found that
their scope and applicability is limited, since, first, their sampling
background does not go much beyond the original data brought by
b t k a and his early followers and, second, some basic concepts involved
in these studies are anticipated without ever being thoroughly investiVlach? also compiled A Bibliography of Lotkas Law and
Related P h e n ~ m e n a . This
~ ~ comprehensive bibliography lists works
of interest not only on Lotka but alsoon the related laws of Bradford and
Zipf, as well as bibliometrics and frequency distributions in general.
In a 1975 letter to the editor of the Journal ofDocurnentation, Coile
criticizes Kochens discussion of authorship in the latters Principles of
Information Retrieval.44In this letter, Coile offers some useful insights
into how the work of Leavens, Simon, Davis, and Dresden came to be
associated with L ~ t k a . ~ ~
Lotkas Law and Monograph Productivity
From this review of the literature, it can be argued that there have
been no studies that replicate Lotkas methodology closely enough to be
compared to Lotkas original work. Few of the authors of these studies
should be faulted for this, because until Murphyspaper in 1973,no one
attempted to compile new data to compare to Lotkas findings. Rather,
earlier work by Dresden, Dufrenoy, Davis, Williams, and Leavens
became associated with Lotkas work by subsequent authors and cited
by some as providing proof of Lotkas law. Murphy, Schorr, Voos, and
others in the 1970s sought to test Lotkas law in various disciplines, but
failed to match the conditions under which Lotka conducted his study,
usually because a suitable bibliographic source was not available.
Vlachjr identified two variables which influence the distribution of
author productivity: (1) the time period under study, and (2) the community of authors involved. None of the studies discussed above match
Lotkas study in both these variables. Lotkas study covered ten years for
the Chemical Abstracts figures, and all of history u p to 1900 for Auerbach. Those that do match or surpass Lotka in time period, notably
Rogge, do not match him in the selection of a community of authors. In
Lotkas study of Chemical Abstracts, the community consists of all
senior authors whose work was included in the 1907-16decennial index.
In his study of Auerbach, the community of authors consists of authors
of the most notable works in the field of physics u p to 1900. In most
studies of author productivity, it is usually the subject field that defines
a community of authors, because that is how journals and bibliograSUMMER
1981
31
WILLIAM POlTER
32
LIBRARY TRENDS
TABLE 5
IJNIVERSITY
OF ILLINOIS
L IBRARY
AT IJRBANA-CHAMPAIGN
STUDYOF PERSONAL
AUTHORS
I N THE CARD
CATALOG
No.
Works
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
26
27
28
30
31
32
33
34
35
36
38
39
40
42
44
47
48
49
51
58
63
66
SUMMER
1981
No.
Authors
1,489
343
160
92
44
35
27
18
12
11
10
9
2
6
9
8
3
2
2
5
5
1
1
2
1
1
2
1
3
1
1
1
3
2
1
2
2
2
1
1
1
1
1
1
1
x
Total Sample
63.50
14.63
6.82
3.92
1.88
1.49
1.15
0.77
0.51
0.47
0.43
0.38
0.09
0.26
0.38
0.34
0.13
0.09
0.09
0.21
0.21
0.04
0.04
0.09
0.04
0.04
0.17
0.09
0.04
0.13
0.04
0.04
0.04
0.13
0.09
0.04
0.09
0.09
0.09
0.04
0.04
0.04
0.04
0.04
0.04
0.04
Total N o .
Entries
1,489
686
480
368
220
210
189
144
108
110
110
108
26
84
135
128
51
36
38
100
105
22
23
48
26
27
112
60
31
96
33
34
35
108
76
39
80
84
88
47
48
49
51
58
63
66
33
WILLIAM POTTER
TABLE 5-Continued
No.
Authors
NO.
Works
0.04
0.04
0.04
0.04
0.04
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2,345
70
90
111
115
149
167
231
266
298
379
592
652
835
1,374
1,490
TOTALS
Total No.
Entries
Total Sample
70
90
111
115
149
167
23 1
266
298
379
592
652
835
1,374
1,490
13,148
0.04
0.04
0.04
0.04
0.04
0.04
0.04
0.04
0.04
0.04
100.00
TABLE 6
UNIVERSITY
OF ILLINOIS
LIBRARY
AT URBANA-CHAMPAIGN
PROPORTION OF AUTHORS
Titles/
Author
Theoretical
(Lotka)
FdXi
Observed
(Illinois)
S"Wi
1
2
3
0.6079
0.1520
0.0675
0.0380
0.0243
0.0169
0.0124
0.0095
0.0075
0.6079
0.7599
0.8274
0.8654
0.8897
0.9066
0.9190
0.9285
0.9360
0.6350
0.1463
0.0682
0.0392
0.0188
0.0149
0.0115
0.0077
0.0051
0.6350
0.7813
0.8495
0.8887
0.9075
0.9224
0.9339
0.9416
0.9467
5
6
7
8
9
IWXJ - S d X I
0.027 1
0.0214
0.022 1
0.0233
0.0178
0.0158
0.0 149
0.0131
0.0107
D < 0.0337
Therefore, UI Library data fit Lotka's law.
Why the LC figures do not fit, while the Illinois figures do, is open
to conjecture. One reason might be that the LC data include persons
occurring as subjects as well as authors. Another possible cause is that
34
LIBRARY TRENDS
Lotkas Law
TABLE 7
LIBRARY
OF CONGRESS
ANALYSIS
OF PERSONAL
NAMEHEADINGS
ON MARC TAPES
No.
Occurrences
No.
Distinct Headings
w
Distinct Headings
~~
1
2
3
4
5
6
7
8
9
10
11-13
14-20
21-50
51-100
101-200
201-300
301-400
401-500
501-1000
1001+
Total
456,328
119,681
46,247
23,951
13,820
8,790
5,827
4,056
2,998
2,153
4,116
3,748
2,678
448
149
47
19
11
5
2
695,074
65.65
17.22
6.65
3.45
1.99
1.26
0.84
0.58
0.43
0.31
0.59
0.54
0.39
0.06
0.02
0.01
0.00
0.00
0.00
0.00
99.99
the Illinois figures cover authors from the beginning of history to the
present, while LC figures cover catalog records established over ten
years. This could also be the reason Lotkas Auerbach figures fit, but not
the Chemical Abstracts data. In any event, the fact that an exact fit is
lacking in the Library of Congress figures is not as important as the
emergence of a general rule which implies that a sufficiently large
sample of a broad community of authors and a large time span will
approximate Lotkas law.
It is of further interest to note that both the LC and Illinois figures
were compiled for a practical management problem-planning for the
implementation of the second edition of the Anglo-American Cataloging Rules. It is not uncommon for other bibliometric formulations to be
used for practical planning, notably Bradfords distribution for planning periodical collections. This, however, is the first known case where
Lotkas law has been useful in planning.
SUMMER
1981
35
WILLIAM POTTER
Conclusion
It has been seen that Lotkas law fits only a portion of the data from
his 1926 study and that his most-cited figures, those for Chemical
Abstracts from 1907 to 1916, do not f i t his distribution. Later studies
assume that Lotkas law had been proven to apply in a variety of subject
areas, when in fact it had not. No data were compiled for the express
purpose of verifying the law until the 1970s, and these recent studies,
while valuable and useful, are not comparable to Lotkas study in terms
of the time period covered and the community of authors involved.
Recent studies of monograph productivity suggest that Lotkas law
might reflect an underlying pattern in the behavior of those people who
produce publications, whether those publications are books or journal
articles. It would appear that when the time period covered is ten years
or more and the community of authors is defined broadly, author
productivity approximates the frequenty distribution that Lotka
observed and that has become known as Lotkas law. If this is correct,
then there is a universal community of all authors who have ever
published whose pattern of productivity might approximate Lotkas
law. Within this universal community, there are many subcommunities
defined, as Vlachj. points out, by discipline, nation. institution, journal, etc. Even time could be used as a dimension to define a subcommunity. All studies ofauthor productivity are concerned with a subset of the
universal community of authors. The smaller the subset, the less likely
i t will be that the measurements of productivity reflect the measurements for the universal community, although these measurements may
be useful and valuable in studying that particular subset. However, the
larger and more representative the subset, the more closely it will
resemble the universal community. The subsets studied by Lotka and
those represented in the Library of Congress study of its MARC tapes
and in the study of the University of Illinois Library card catalog are the
largest yet ronsidered, and the similarity of their patterns of author
productivity and behavior suggest that broader patterns do indeed exist.
The above review of literature associated with Lotkas law suggests
several areas for future research. First, the work of Dufrenoy and others
on the annual productivity of authors points to an interesting measure
of author behavior. Second, Radhakrishnan and Kernizan make a convincing argument for the use of large-scale machine-readable data bases
in the study of author Productivity. They suggest that the machine
version of Engineering Index could be used, and this would be especially interesting in that Engineering Index is a multidisciplinary data
base with records that are well indexed. Thus, subsets could be defined
36
LIBRARY TRENDS
References
1. Lotka, Alfred J. The Frequency Distribution of Scientific Productivity.
Journal of the Washington Academy of Sciences 16(19 Junr 1926):323.
2. Coile, Russell C. Lotkas Frequency Distribution of Scientific Productivity.
lournal of the ASZS 28(Nov. 1977):366.
3. Murphy, Iarry J. Lotkas Law in the Humanities? Journal of the ASZS
24(Nov.-Dec. 1973):461-62.
4. Schorr, Alan E. Lotkas Law and Map Librarianship. Journal of the ASIS
26(May-June 1975):189-90.
5. Decennial Index to Chemical Abstracts, vols. 1-10, 1907-1916. Easton, Pa.:
American Chemical Society, n.d.
6. Auerbach, Felix. Geschichtstafeln der Physik. Leipzig: Barb, 1910.
7. Schorr, Alan E. Lotkas Law and Library Science. R Q 14(Fall 1974):32-33.
8. Voos, Henry. Lotka and Information Science. Journal of the A S I S
25(July-Aug. 1974):270-72.
9. Turkeli, Arif. The Doctoral Training Environment and Post-Doctorate
Productivity Among Turkish Physicists. Science Studies 3(1978):311-18; Krisciunas,
Kevin. Letter to the editor in Journal of the ASZS 28(Jan. 1977):65-66; Hubert, John J.
Letter to the editor in Journal of the ASIS 28(Jan. 1977):66; and Allison, Paul D., and
Stewart, John A. Productivity Differences among Srientists: Evidence for Accumulative
Advantage. American Sociological Review 39(Aug. 1974):596-606.
10. Krisciunas, letter to the editor, p. 65.
11. Turkeli, Doctoral Training Environment, p. 31 1.
12. Allison and Stewart, Productivity Differences among Scientists, p. 596.
13. Price, Derek de Solla. Little Science, Big Science. New York: Columbia University Press, 1963.
SUMMER
1981
37
WILLIAM POTTER
14. Dresden, Arnold. A Report on the Scientific Work of the Chicago Section,
1897-1922.American Mathematical Society Bulletin 28(July 1922):303-07;Dufrenoy,
Jean. The Publishing Behavior of Biologists. Quarterly Review of Biology 13(June
1938):207-10;Davis, Harold T. The Analysis of Economic Time Series. Bloomington,
. Theories of Econometrics. Bloomington,
Ind.: Principia Press, 1941.Seealso
Ind.: Principia Press, 1941, pp. 45-50; Williams, C.B. The Numbers of Publications
Written by Biologists. Annals of Eugenics 12(1944):143-46;Zipf. George K. Human
Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Cambridge, Mass.: Addison-Wesley, 1949; Leavens, Dickson H. Letter to the editor in Econometrica 21(0ct. 1953):630-32;and Simon, Herbert A. On a Class of Skew Distribution
. Models of Man.
Functions. Bzometrika 42(Dec. 1955):425-40;reprinted in
New York: Wiley, 1957.
15. Price, Derek de Solla, and Gursey, S. Studies in Scientometrics. Part I. Transcience and Continuance in Scientific Authorship. International Forum on Information Documentation I ( 1976):17-24; Bookstein, Abraham. The Bibliometric
Distributions. Library Quarterly 46(0ct. 1976):416-23;Bookstein, Abraham. Patterns of
Scientific Productivity and Social Change. Journal of the ASIS 28(July 1977):206-10;
Allison, Paul D. et al. Lotkas Law: A Problem in Its Interpretation and Application.
Social Studies of Science 6(1976):269-76;and Shockley, William. On the Statistics of
Individual Variations of Productivity in Research Laboratories. Proceedings of the
Institute of Radio Engineers 45(March 1957):279-90.
16. Dresden, Report on the Scientific Work.
17. Hubert, letter to the editor, p. 66.
18. Dufrenoy, Publishing Behavior of Biologists.
19. Davis, Analysis of Economic Time Series.
20. Williams, Publications Written by Biologists.
21. Zipf, Human Behavior.
22. Leavens, letter to the editor.
23. Simon, On a Class of Skew Distribution Functions.
24. Price, Little Science, p. 43.
25. Ibid., pp. 48-49.
26. Bookstein, Patterns of Scientific Productivity; and Allison, et al., Lotkas Law.
27. Fairthorne, Robert A. Progress in Documentation. Journal of Documentation 25(Dec. 1969):325.
28. Naranan, S. Power Law Relations in Science Bibliography-A Self-Consistent
Interpretation. Journal of Documentation 27(June 1971):83-97;and Bookstein, Bibliometric Distributions.
29. Krisciunas, letter to the editor, pp. 65-66.
30. Coile, Lotkas Frequency Distribution; and Hubert, letter to the editor.
31. Voos, Lotka and Information Science.
32. Coile, Russell C. Letter to the editor in Journal of the ASIS %(MarchApril 1975):133.
33. Schorr, Alan E. Lotkas Law and the History of Legal Medicine. Research in
Librarianship SO(Sept. 1975):205-09.
34. Tudor, Dean. Letter to the editor in R Q 14(Winter 1974):187.
35. Rogge, A.E. A Look at Academic Anthropology. American Anthropologist
78(Dec. 1976):835.
36. Ibid.
37. Radhakrishnan, T., and Kennzan, R.Lotkas Law and Computer Science Literature. Journal of the ASIS 3O(Jan. 1979):51-54.
38. Ibid., p. 54.
39. Vlach?, Jan. Variable Factors in Scientific Communities (Observations on
Lotkas Law)/ Teorie a Metoda 4(1972):91-120.
. Time Factor in Lotkas Law. Probleme de Informare si
40.
Documentare 10( 1976):44-87.
38
LIBRARY TRENDS
Lotkas Law
41. Ibid.. p. 48.
42. Ibid., p. 46.
43.
, comp. Frequency Distribution of Scientific Performance: A
Bibliography of Lotkas Law and Related Phenomena. Scientometrics, Bibliography
Section 1(1978):109-30.
44. Chile, Russell C. Letter to the editor in Journal of Docurnentation 31(Dec.
1975):298-301;see also Kochen, Manfred. Principles of lnformation Retrieval.
Los Angeles: Melville, 1974.
45. Other works which cite and discuss Lotka to some extent include: Aiyepeku.
Wilson 0. The Productivity ol Geographical Authors: A Case Study from Nigeria.
Journal of Documentation 32(June 1976):105-17:Cole, Jonathan R., and Cole, Slephen.
The Ortega Hypothesis. Science 178(0ct. 1972):368-75; Mantell, Leroy H. On Laws of
Special Abilities and the Production of Scientific Literature. American Documentation
17(Jan. 1966):8-16;and Narin, Francis, et al. Eualuative Bibliometrics: The Use of Publzcation and Citation Analysis in the Evaluation of Scientific Activity. Cherry Hill, N.J.:
Computer Horizons, Inc., 1976. (CH Project No. 704R)
46. MrCallum, Sally fi.,andGodwin. James L. Statisticsof Headingsin the MARC
File. Network Development Offire, Library of Congress, unpublished paper, 5 Jan. 1981.
47. Potter, William G. When Names Collide: Conflict in theCata1ogandAAC:RZ.
Library Resources & Technical Services 24(Winter 1980):3-16.
SUMMER
1981
39
M. CARL DROIT
NATURAL
LAWS DESCRIBE PATTERNS which are regular and recurring. The
scientific point of a law is twofold. First, a concrete statement of a law
may give give us the ability to better predict events or to shape our
reactions to them. Second, a physical law may help in the development
of theories which explain why a particular pattern occurs. Natural laws
therefore are of interest because they offer the opportunity for empirical
application and for theoretical understanding. On the other hand, the
ability to articulate a law does not automatically guarantee either
empirical or theoretical advances.
Bradfords law begins with a regularity which is observed in the
retrieval or use of published information. Broadly speaking, this regularity is characterized by both concentration and dispersion of specific
items of information over different sources of information. Thus, for a
search on some specific topic, a large number of the relevant articles will
be concentrated in a small number of journal titles. The remaining
articles will be dispersed over a large number of titles. Throughout the
remaining discussion, journal articles will be used to represent the items
retrieved and journals will be the sources. This is in keeping with most
of the Bradfords law literature, although there is clear evidence that
similar patterns occur for other kinds of items and sources.
The literature on Bradtords law incorporates both theoretical and
empirical aspects. These aspects are each coherent and developingareas
of scientific inquiry. Confusion arises, however, when the two aspects
M. Carl Drott is Associate Professor,School of Library and Information Science, Drexel
University, Philadelphia.
SUMMER
1981
41
M. CARL D R O l T
42
LIBRARY TRENDS
Bradfords Law
X 5 X 5) as many titles. Thus, to show title groups contributing an equal
number of articles, one could write:
9 : 9 x 5 : 9 x 52
Recognizing that the size of the core (9)and the multiplier ( 5 )might be
different for other searches, we divide the groups by nine and replace the
multiplier with a variable. This gives groups of titles with sizes:
1 : a : a2
where each of the three groups of titles contributes the same number of
articles.
This is the first theoretical statement of Bradfords law. Note that
while it was founded on empirical observation, it is not derived strictly
from the data. (As noted above, the data do not quite fit the law either in
the exact number of articles in each group or in matching the calculated
number of titles to theobservednumber.)Asastatementof
a natural law
this formulation has several shortcomings. The most serious problem is
that the phenomenon is described in terms of groups of journals. These
rather large aggregations of titles seem to be an artifact of the statement
of the law. That is, i t appears that the dispersion of articles over ranked
titles is mathematically regular rank by rank rather than being regular
only for groups. There is also no hint in the formula or its derivation as
to what kind of underlying probabilistic process creates this scattering.
Bradfords formulation also leaves unanswered questions for those
working with empirical data. How does one establish the size of the
core? What is the best value of a for any particular set of data (recognizing that, as above, no value of a fits the observations exactly)?These
questions are indicative of the gap that arises between empirical and
theoretical consideration of the phenomenon.
Work on clarifying and refining the theoretical statement of Bradfords law was undertaken by B.C. Vickery, M.G. Kendall,3F.F. h i m k ~ h l e r and
, ~ others. The most profound impact on the theoretical
foundation of Bradfordslaw has come from the efforts of B.C. B r o o k e ~ . ~
Brookes began with Bradfordsratios as portrayed above. Drawing
on the work of Vickery, he derived a formula which did not depend on
groupings of journal titles. The formula was this:
R(n) = k log (n)
where:
SUMMER
1981
43
M. CARL DRO'IT
44
LIBRARY TRENDS
Bradfords Law
R(n) = k log (n/s)
He also imposed the limitation that this statement of Bradfords law
may not hold for the most frequently appearing titles in a data set. This
modification can be viewed as a speculation on the fundamental theoretical question. That question asks the underlying reason for the
observed regularity. This modification, in essence, says that the underlying process which creates the regularity may be different from the
process which causes the top-ranked titles to diverge from regularity. In
other words, the behavior of the top-ranked journals may present a
different theoretical problem than the pattern of the remaining titles.
There is another problem in accommodating the mathematical
form of Bradfords law to the observed data. In this case, the issue
involved those titles which contribute only a few articles (or a single
article) each. Empirical data show that there are not as many of these
little-used sources as the theory would predict. If the formula is correct,
then the total number of titles found must be exactly the value of k. In
practice, observed searches fall short of this number.
The data on little-used titles again raise a problem for theorists:
either to modify the statement of the law or to reject the empirical data.
Rejecting the data in this case means assuming that the observed
searches are incomplete. Realistically, however, many of the searches
are well and painstakingly done. It is hard to imagine how they could be
made more complete.
Theorists have chosen to accept the mathematical formula and
reject the empirical data. The reasons for this choice illustrate an
important aspect of the difference between theory and empiricism. The
important factor to theorists is that the mathematical form of Bradfords
law as stated above is very agreeable in a mathematical sense. In its
present form, Bradfords law can be related to other mathematical
models of dispersion. These models include the gamma, Poisson, and
binomial distributions. These other distributions have been extensively
studied. The scattering phenomena which these distributions have been
shown to describe seem related to bibliometric scattering. Thus, in
rejecting the empirical data, theorists are not saying that they believe
that searches are incomplete or that k truly predicts the true number of
titles that will be found. Theorists are instead saying that they believe
that the advancement of understanding lies in the study of certain
mathematical forms. The question of conformity to empirical data is
seen as less important in this situation.
SUMMER
1981
45
M. CARL DROTT
46
LIBRARY TRENDS
Bradfords Law
ponding to the variable R(n), and a cumulative percentage. From a n
empirical point of view, the cumulative percentage of articles is the
most important. T h e pattern is that a high percentage of the articles
comes from a very small number of journals. At this point any knowledgeable librarian can nod in agreement. Good practice dictates that the
most-used titles must be identifiedand their availability assured. O n the
other hand, there are a large number of titles with low usage. Only the
largest budget could justify holding them all. Yet, it is clear that access
must be provided.
T h e discussion above is better classed as conventional wisdom than
as exploitation of a natural law. The challenge (as yet unmet) of empirical studies is to find a way of using quantitative regularity to make
decisions which are more precise than simple intuition would provide.
Before we can say much about using Bradfords law, we must have
some way of knowing if a set of data conforms to the law. This immediately raises problems. In every kind of goodness-of-fit test we need to
have some source of predicted values against which to judge our data.
Thus, we must ask the question: What is Bradfords law? T h e usual
answer is that it is the formula for R(n) given earlier. But this is not
completely rational. As discussed above, the formula is known to be in
disagreement with empirical observation. Further, the formula
excludes the most-used titles, which in many actual situations may be
the most important. T h i s exclusion is complicated by the fact that
exactly how many titles are to be excluded is undefined. This number is
usually determined by the process of inspection, a rather arbitrary
procedure.
In spite of the problems, the formula given above is generally taken
as the source of expected values. This means that one must obtain values
for k and s, the two constants in the equation. These are obtained by
recognizing that if ideal data were plotted with one axis for R(n)
(cumulative articles) a n d the other for log (n) (log rank), the result
would be a straight line. T h e variable k and s represent the slope and
intercept, respectively, of that line. T h e usual process for obtaining
these values follows. First, the data are plotted OR semilogarithmic
graph paper. Next, a straight line is drawn through some central portion of the curve. This offers the investigator a n arbitrary choice as to
how much of the data to use and exactly what straight line best fits
those data. T h e value of the slope ( k )is determined for the line. This is
often done by using only two points, thus introducing further arbitrariness. T h e intercept (s)is obtained either by graphical extrapolation or
by using the slope and a point on the line.
SUMMER
1981
47
M. CARL DROTI
48
LIBRARY TRENDS
Bradford's Law
Even if the sample sizes are the same, it is still difficult todetermine
if two data sets should be considered identical within the limits of
sampling error. This problem hequently arises when samples are taken
in the same situation but at different times. Some of the variation in the
rankings of titles will be due to sampling error. But changes in rank may
also reflect real changes in the use of a title. The sample sizesrequired to
resolve this issue are very large indeed. For example, Brookes has calculated that to achieve a 95 percent confidence level that two adjacent titles
should not reverse their order, a sample size of several thousand-if the
titles are high (e.g., 5 or 6) in the ranking-isrequired." The resolution
of lower-ranked pairs requires much larger samples (tens or hundreds of
thousands). Consideration of these sample sizes should make any
researcher cautious in accepting the accuracy of empirical data.
1981
49
M. CARL DROTT
50
LIBRARY TRENDS
Bradfords Law
Summary
The literature on Bradfords law presents the casual reader with a
number of pitfalls. The first problem is to distinguish theoretical from
empirical research. Theoretical work is aimed at understanding a random probabilistic process. To this end, assumptions are made which aid
mathematical manipulation. Empirical stddies concentrate on describing the world from a practitioners point of view. In these studies the
descriptive qualities of the data are more important than the statistical
aspects. A second problem is the large number of marginal claims in
the literature, that is, claims which are clearly speculative or are simply
unsupported. Some of this writing is not intended for acceptance without further study. Other articles are simply weak scholarship. In both
cases the reader must decide what to reject.
Between theory and empiricism lies a gap. This gap is the fact that
at present, the intellectual richness of real situations is not represented
in the mathematical austerity of the theoretical equations. It remains to
be seen if this gap can be bridged by further research.
Overall, Bradfords law represents an elusive phenomenon. On one
hand, it is easy to observe in real situations and can be represented with a
fairly simple mathematical formula. On the other hand, Bradford-type
data resist statistical testing, and the model fails to reveal the underlying
process which causes the distribution. In any case, the wise reader will
examine any study of Bradfords law closely before rushing to believe
more than is actually stated and supported.
References
1. Bradford, Samuel C. Sources of Information on Specific Subjects. Engineering
137(26 Jan. 1934):SS-SS;and
. Documentation. Washington, D.C.: Public
Affairs Press, 1950.
2. Vickery, B.C. Bradfords Law of Scattering. Journal of Documentation 4(Dec.
1948):198-203.
3. Kendall, M.G. The Bibliography of Operational Research. Operational
Research Quarterly 1l(March/June 1960):31-36.
4. Leimkuhler. Ferdinand F. The Bradford Distribution. Journal of Docurnentation 23(Sept. 1967):197-207.
5. Brookes, Bertram C. The Derivation and Application of the Bradford-Zip[
. Bradfords
Distribution. Journal of Documentation 24(Dec. 1968):247-65;
Law and the Bibliography of Science. Nature 224(6 Dec. 1969):953-56;and
Obsolescence of Special Library Periodicals: Sampling Errors and Utility Contours.
Journal of the ASZS 21(Sept.-Oct. 1970):320-29.
6. Price, Derek de Solla. A General Theory of Bibliometric and Other Cumulative
Advantage Processes. Journal of the ASZS 27(Sept.-Oct. 1976):292-306.
SUMMER
1981
51
M. CARL D R O l T
7. Drott, M. Carl, et al. Bradfords Law and Libraries: Present ApplicationsPotential Promised. ASLZB Proceedings 31(June 1979):296-304.
8. Mosteller, Frederick, and Wallace, David L. Inference and Disputed Authorship:
The Federalist. Reading, Mass: Addison-Wesley, 1961.
9. Brookes, Bertram C. Theory of the Bradford Law. journal of Documentation
33(Sept. 1977):180-209.
10. Ibid.
11. Pratt, Allan D. A Measure of Class Cmncentration in Bibliomeuics.Journal of
the ASZS 28(Sept. 1977):285-92.
12. Lawani, S.M. Periodical Lirerature of Tropical and Subtropical Agriculture.
Unesco Bulletin for Libraries %(March-April 1972):88-93;and
. Bradfords
Law and the Literature of Agriculture. Znternatzonal Library Review 5(July 1973):
34 1-50.
13. Goffman, William, and Morris, Thomas G. Bradfords Law Applied to the
Maintenance of Library Collections. In Introduction to Injormation Science, edited by
Tefko Saracrvic, pp. 200-03. New York: Bowker, 1970.
52
LIBRARY TRENDS
Introduction
ONEOF THE MOST PUZZLING phenomena in bibliometrics-and, more
broadly, in quantitative linguistics-is Zipfslaw. Asonecommentator,
the statistician Gustav Herdan, has put it: Mathematicians believe in
[Zipfs law] because they think that linguists have established it to be a
linguistic law, and linguists believe in it because they, on their part,
think that mathematicians have established it to be a mathematical
law.
Let us start by considering a basic form of Zipfs law. Suppose one
has a natural-language corpus, e.g., a book written in English. Next,
suppose one makes a frequency count of the words in the corpus, i.e.,
counts the number of occurrences of the, and, of, etc. Finally, suppose
one arranges the words in decreasing order of frequency so that the most
frequent word has rank 1; the next most frequent, rank 2; and so on.
For example, a frequency count of the 75 word-types (i.e., dictionary entries) represented by the 142 word-tokens (i.e., distinct occurrences) in the two preceding paragraphs yields the partial results shown in
table 1. This set of rank-ordered frequency counts, though quite small
for the purpose, serves moderately well as an illustration of the fact that
rank and frequency have a surprisingly constrained relationship in
natural-language corpora. The values of the products of rank r and
frequency f fall in the relatively limited range 27-30 in the middle of
table 1 , and we may note that there was no a priori reason for us to expect
that the middle products rf would fall within so limited a range.
J
1981
53
RONALD WYLLYS
TABLE 1
Rank r
Word-Type
the
in, of
a, one
law
and, it
suppose, that, Zipjs
(21 words)
(43 words)
1
2-3, rnean=2.5
4-5, meanz4.5
6
7-8, meanz7.5
9-11, meanzl0.0
12-32, rnean=22.0
33-75, rnear~54.0
Frequency f
Product ~f
9.0
17.5
27.0
30.0
30.0
30.0
44.0
54.0
(4)
rBf = c
Note that if B takes on the particular value I , then equation (4) becomes
identical with equation (1). Thus, equation (4) is a generalization of
Zipfs law, and we shall refer to it as the generalized Zipfs law.
54
LIBRARY TRENDS
Zipfs Law
T
-3
LOG RRNK
Fig. 1. Observed Rank-Frequency Pairs for a Corpus of 21,354 Words
The solid line is the regression line for the data and has slope -0.92; the dashed
line has slope -1.0.
Source: Wyllys, Ronald E. The Measurement of Jargon Standardization in Scientific
Writing Using Rank-Frequency (Zipf)Curves. Ph.D. diss., University of WisconsinMadison, 1974.
It should be noted that Zipfs law only approximates the relationship between rank r and frequencyf for any actual corpus. Zipfswork
shows that the approximation is much better for the middle ranks than
for the very lowest and the very highest ranks, and his work with
samples of various sizes suggests that the corpus should consist of at
SUMMER
1981
55
RONALD WYLLYS
LIBRARY TRENDS
Zipfs Law
It is interesting to note that, unfortunately, the critics of quantitative
analysis are still very much with us nearly fifty years later.
In his next book, The Psycho-Biology of Language, published in
1935, Zipf called attention for the first time to the phenomenon that has
come to bear his name. This book contained Zipfs first diagram of the
log(frequency)-v.-log(rank)
relationship, a Zipf curve for his count of
words in the Latin writings of Plautus.
Zipfs last book, Human Behavior and the Principle of Least Effort:
An Introduction to Human Ecology, appeared in 1949. As its title
indicates, this work is an exposition of what Zipf considered the fundamental reason for much of human behavior: the striving to minimize
effort. The diversity of phenomena to which Zipf was able to apply his
mathematical models, equations (1) and (2), is impressive.
Despite his strong defense of quantification, Zipf really did not
argue in quantitative terms. It is true that he performed counts of
linguistic phenomena, tabulated the counts, and displayed them. But
his mathematics were weak, and his energies were spent in philosophizing about the implications of his principles. Support for this comment
may be found in another passage from Selected Studies: Before returning to linguistic considerations, let me say here for the sake of any
mathematician who may plan to formulate the ensuing data more
exactly, the ability of the highly intense positive to become the highly
intense negative, in my opinion introduces the devil into the formula in
the form of [the square root of -13. And now to linguistics.*
Zipf appears to this writer to have been poorly trained for dealing
with quantitative phenomena. His knowledge of mathematics was
minimal; of statistics, apparently nonexistent. He never showed interest
in exploring the quantitative nature of his data beyond noting that they
came close to his model of the moment. This done, he would launch
into lengthy speculations about hazily defined possible causes. It is a
p i t y that he almost never collaborated with statisticians. On the other
hand, he was an indefatigable worker, and pursued the rank-frequency
phenomenon and related ideas for twenty years despite often harsh
criticism. There can be little doubt that the ubiquity of these phenomena would be less well recognized were i t not for his work.
Alternative Forms of Zipfs Law
In Human Behavior and the Principle of Least Effort, Zipf presented an interesting exception to his usual insistence that the slope of
linguistic Zipf curves is -1, i.e., that only equation (I), andnotequation
(4), applies to linguistic data. He noted that frequency counts of the
SUMMER
1981
57
RONALD WYLLYS
LIBRARY TRENDS
ble future problems. That in turn means that the person will strive to
minimize the probable average rate of his work-expenditure (over
SUMMER
1981
59
RONALD WYLLYS
Pf= X
for f=l
(7.1)
(7.2)
60
LIBRARY TRENDS
Zipfs Law
that 0 < a < x. The function is due to Irwin,lg who discovered it in a
search for distributions useful in biology, and who credited Waring
with discovery of the basic inverse factorial expansion underlying the
probability function. Since it was Herdan who recognized that Irwins
result had linguistic applications, the function has come to be known as
the Waring-Herdan formula in linguistics. Several investigators have
reported that i t fits observed rank-frequency data well. Good fits to
observed rank-frequency data by another model, the lognormal distribution, have been reported by V. Belevitch and John B. Carroll.21
Bruce M. Hill and Michael WoodroofeZ3have pursued the derivation of a probabilistic form of Zipfs law by applying Bose-Einstein and
Maxwell-Boltzmann statistics to the classical occupancy problem. A
similar derivation has been offered by Yuji Ijiri and H.A. Simon.*
These papers employ various initial conditions to yield various of the
Zipf, Bradford and other related distributions. The interrelatedness of
these distributions has been shown by, inter alios, Bertram C. BrookesE
and Robert A. Fairthorne.26
A different starting point has been suggested by H.S. Sichel. He
assumes that each word in ...[an authors vocabulary has] a long-term
probability of o ~ c u r r e n c e . The
~ ~ mixing of thousands of such probabilities during the production of speech or writing can be expressed as a
compound Poisson probability, of which a number of known [distribution functions] such as the Poisson, negative binomial, geometric,
Fishers logarithmic, ...Yule, Good, Waring and Riemann distributions
are ...limiting forms.28Sichel reports very close fits of his model to some
twenty published frequency counts. A related paper by B.C. BrookesB
treats a model of a very mixed Poisson process, and another article by
Brookes and Jose M. Griffithsm derives from this process a frequencytransfer coefficient as a means of measuring the correlation of frequency and rank. Empirical tests of the theories are sufficiently rare that
reports of such tests by Beth Krevitt and Belver C. Griffith31and by Anita
Parunaka deserve mention.
The negative binomial distribution has been the starting point for
other investigations, including one by B.M. Hill treating the numberof-species problem but mentioning its relation to Zipfs law.% A major
effort along these lines is that of Derek de Solla Price, who has developed
a modification of the negative binomial that he calls the cumulative
advantage distribution (CAD). In the CAD the conditions of the negative binomial are modified so that success increases the chance of
further success, but unlike in the negative binomial: failure has no
subsequent effect in changing probabilities ....Failure does not constiSUMMER
1981
61
RONALD WYLLYS
62
LIBRARY TRENDS
Zipfs Law
What implications does Zipfs law have for the design of information systems? The honest answer has to be few, if any. So far as vocabulary control is concerned, Zipfs law offers no useful information beyond
what frequency-counts alone can easily supply. The present writer has
suggested that different subject-fields may be characterized by different
slopes of Zipf curves,3 but again this possibility seems to have no
practical applications at present in information system design. Perhaps
such applications will develop in the future. Meanwhile, we can continue to surprise ourselves with the ubiquity of the Zipf phenomenon
and to enjoy the intellectual challenge of achieving a full, rational
understanding of it.
References
1. Herdan, Gustav. The Advanced Theory of Language as Choice and Chance.
Berlin: Springer-Verlag, 1966, p. 33.
2. See, for example, Zipl. George K. Human Behavior and the Principle of Least
Effort. Cambridge, Mass.: Addison-Wesley, 1949. Reprint ed., New York: Hafner, 1965.
3. Ibid., p. 291.
4.
.Selected Studiesof the Principle of Relative Frequency in Language.
Cambridge, Mass.: Harvard IJniversity Press, 1932.
5. Ibid.. p. 9.
. The Psycho-Biology oflanguage. Boston: Houghton Mifflin, 1935.
6.
Reprint ed., Cambridge, Mass.: MIT Press, 1965.
7.
, Human Behavior.
, Selected Studies, p. 21.
8.
, Human Behavior, pp. 295-96.
9.
10. Wyllys, Ronald E. The Measurement of Jargon Standardization in Scientific
Writing Using Rank-Frequency (Zipf)Curves. Ph.D. diss., University of WisconsinMadison, 1974.
11. Mandelbrot, Benoit. Structure formelle des textes et communication. Word
10(1954):1-27, 424-25.
. An Informational Theory of the Statistical Structure of Language.
12.
In Communication Theory: Papers Read at a Symposium on Applicationsof Communication Theory, edited by Willis Jackson, pp: 486-502. London: Butterworths, 1953.
13. Edmundson, Harold P. The Rank Hypothesis: A Statistical Relation between
Rank and Frequency. Technical report TR-186. College Park: Computer Science Center,
University of Maryland, 1972.
14. Bookstein, Abraham. The Bibliomeuic Distributions. Library Quarterly
46(0ct. 1976):416-23;and
. Explanations of the Bibliometric Laws. Collectzon Management S(Summer/Fall 1979):151-62.
15. Simon. Herbert A. On a Class of Skew Distribution Functions. Biornetrzka
42(Dec. 1955):425. Reprinted in Models of Man: Social and Rational. New York: Wiley,
. Some Further Notes on a Class of Skew Distribu1957, pp. 145-64.See also
tion Functions. Information and Confrol 3(March 1960):80-88.
16. Simon, On a Class of Skew Distribution Functions, p. 425.
17. Zipf, Human Behavior and the Principle of Least Effort, p. 1.
18. Herdan, Gustav. Quantitative Linguistics. London: Buttcrworths, 1964,
PP. 85-88.
SUMMER
1981
63
RONALD WYLLYS
64
LIBRARY TRENDS
Introduction
OVERTHE PAST fifty years, a sizable body of literature dealing with
bibliometric models has developed. The early models were proposed
because they were observed to fit graphically certain specific empirical
frequency distributions. In many cases their functional forms were
identical, the similarity only noted by other writers years later. In each
case, depending on the subject field they applied to, there was a proliferation of papers which modified, extended, clarified, applied, andgeneralized the initial model.
Almost all bibliometric models relate, in a simple functional form,
one variable with another variable. For example, in journal productivity studies, for a bibliography covering a certain span of years on a
particular subject, a few journals contribute a large number of articles,
other journals contribute fewer, and so on in a monotonic sequence
ending with a large number of journals contributing one articleeach to
the subject. The two variables are number of journals and number of
articles. After arranging the journals in a decreasing order of productivity, a frequency-sizedistribution is obtained for the number of journals
containing a fixed number of articles each. Conversely, a frequencyrank table can be constructed for the number of articles associated with a
journal of fixed rank. These two approaches to observed patterns form
the two modes of the data tabulations.
SUMMER 1981
65
JOHN HUBERT
TABLE 1
DISTRIBUTION
OF THE NUMBER
OF
f(n) CONTRIBUTING
n ARTICLESEACH
FREQUENCY-SIZE
JOURNALS
f(n)
nf(ni
1
2
3
4
5
6
7
102
25
13
2
7
102
50
39
6
21
24
9
20
26
15
18
22
N = 395
8
9
10
13
15
18
22
Sum
3
3
1
2
2
1
1
1
J=164
8
55
In the last twenty-five years, i t has been observed that such tabulations occur for other pairs of variables from a wide variety of natural and
social phenomena. Table 2 provides some examples of such combinations of observation versus class relationship.
To understand the frequency-rank approach, consider the example
given in table 1. Near the bottom of the table there is one journal
contributing the most (twenty-two)articles. This journal is assigned the
rank 1. The next most productive journal is assigned rank 2 because it
66
LIBRARY TRENDS
TABLE 2
EXAMPLES
OF OBSERVATION-CLASS
RELATIONSHIP
0bseruation
Class
Number of articles
Number of citations
Number of insects
Length of word
Number of papers
Number of Occurrences
Checked-out frequency
Number of Occurrences
Length of sentence
Number of phonemes
Income level
journals
persons
species
words
authors
initial digits
books
nouns
sentences
words
persons
1981
67
JOHN HUBERT
TABLE 3
A FREQUENCY-RANK DISTRIBUTION
OF THE NUMBER
OF
ARTICLES
g (r) CONTRIBUTED
BY A JOURNAL OF RANKr
3
5
7
8
11
14
15
22
24
37
62
164
22
18
15
13
10
9
7
6
5
4
3
2
1
present scope and purpose of this article. However, each article in the
appendix to this paper contains a model which would be included in
this list because each adequately fits and models some form of tabulation. One word of caution is necessary: some of the models have been
declared as new and general, while others are self-declared and are
neither new nor general. There are survey articles on many of these
models, and some of these articles provide the mathematical equations,
historical developments, interrelationships, and examples of data sets
where the models have been useful.2
There are three models which are claimed to be general because
they possess two important properties: first, they include earlier models
as special cases; and second, they are applicable to a large class of
bibliometric variables. These are the models of Price, Bookstein and
Brookes. Bookstein especially has claimed that the major bibliometric
models-Bradford, Lotka and Zipf-are in fact a single law that seems
capable of describing phenomena in a vast variety of subject area^."^
The three models of Price, Bookstein and Brookes are discussed in the
following sections, with special attention to their derivations and to
their appropriateness as general models that can account for some of the
individual models mentioned above.
68
LIBRARY TRENDS
The Price model4 is also known as the cumulative advantage distribution (CAD) and can be defined as follows: if f(nj is the fraction of
contributors having n articles each, then f(n) = (m + 1)B(n, m +2), for n
= 1, 2, ..., with the parameter m > 0, and B( e, 0 ) is the Beta function. The
Beta function is a name for a fundamental integral" involving two
parameters, and there is no simple verbal expression for this f ~ n c t i o n . ~
The CAD was proposed as a frequency-size type model because it yields
the relative frequency or proportion of authors each of whom has
produced a fixed number of articles on a specific area over a fixed period
of time. Over a finite range of observational values of n ,a distribution of
authors is obtained, and the model can be fitted so as to follow closely
the observed pattern. When the fit is statistically adequate it can be used,
for example, to predict the percentage of authors who have contributed
more than n papers each, and if n is large, this provides an estimate of
the set of so-called prolific authors on a subject area. Other important
uses such as in citation analysis have been illustrated by Price.
This model has as a rough approximation that f(n)is proportional
to ia,
where a > 0. This implies that as n increases, f(n)decreases, which
suggests that there are many authors having one paper each, and so on
in a decreasing fashion, with very few authors contributing many
papers. There is only one parameter in the model, and its value depends
on a particular data set. Price himself considers his model to be quite
general: "It provides a sound conceptual basis for such empirical laws as
the Lotka Distribution for Scientific Productivity, the Bradford Law for
Journal Use, the Pareto Law of Income Distribution, and the Zipf Law
for Literary Word Frequencies. It is therefore an underlying probability
mechanism of widespread application and versatility throughout the
social sciences.*16
How does one obtain such a model? The early attempts before 1950
by Yule, Pareto, Zipf, and Bradford were basedon plotting the data with
f(n) versus n , for example, then findinga mathematical equation which
would adequately represent the pattern observed in the particular discipline (Yule in biology, Pareto in economics, Zipf in linguistics, and
"The Beta function IS also known as Euler's first integral and is ddined as:
B(a,b) =
f'
0
x*-'( 1 .xf-' dx =
(a-l)!(b-l)!
(a&-])!
a>l,b>l,
1981
69
JOHN HUBERT
70
LIBRARY TRENDS
1981
71
JOHN HUBERT
special case. (Lotka's model is also known as the inverse-square law, and
essentially states that f(n) is proportional to l/n2, for n = 2,3,... .) The
second constraint is that if a publication distribution is observed over 1
time periods (e.g., t = 10 years), then the function f should satisfy the
relation f(tn)= f(t) X f(n).Bookstein calls this the "symmetry property"
or the "invariance property."" Bookstein claims that the only realistic
function satisfying these conditions and empirical data is f(n) proportional to l/n" where a is a positive number and estimable from the data.
(It is true that for this model equation we have Lotka's law when a = 2,
and furthermore, the symmetry property is satisfied since f(tn) = l/(tn)"
= (l/t")(l/n")= f(t)f(n).)It is also claimed that the model is the only one
which is unchanged whether the population of authors under study
remains the same, increases or decreases over time." This claim has not
been convincingly demonstrated.
There are four important observations which can be made about
this model:
1. The model equation is a special case of the model equation involving
the Beta function advocated by both Simon and Price. In fact, Bookstein recognizes this: "Simon's model and mine ...are not identical,
they converge at large n."13
2. The model equation is not the only possible equation satisfying his
two constraints.
3. The path to the model is different from the other paths discussed
earlier. In 1924 Yule used the empirical data fitting technique; in
1955 Simon used stochastic birth process assumptions; in 1976 Price
used the urn scheme mechanism; and in 1977 Bookstein used symmetry and other conditions to establish the model.
4. The model is not original. The form of the Bookstein model equation appears in earlier papers, as demonstrated in Fairthorne and
Hubert,14where we see that the very early models of Pareto, Zipf and
Stevens, and later Naranan15are exactly this model for the frequencysize tabulation. Hubert has proposed this same model equation for
the frequency-rank tabulation.16
The implication of the first observation is that the Bookstein model isa
special case of the model involving the Beta function. Therefore, in this
sense, the Bookstein model is less general. Also, since the model involving the Beta function fits many observable variables, because it is so
adjustable to a variety of shapes, and since the form nd is not as
adjustable, then, in this sense, the Bookstein model is less general. We
will return to the property of generality in a later section.
72
LIBRARY TRENDS
j =r
1. The variable 7 acts as a rank because it is equivalent to the maximumrank assignment scheme mentioned earlier.
SUMMER
1981
73
JOHN HUBERT
TABLE 4
THEBRADFORD-TYPE
TABULATION
OF THE ACCUMULATED
NUMBER
OF REFERENCES
G(r) CONTAINED
IN THE
JOURNALS
FIRST7 MOSTPRODUCTIVE
Accumulated N o . of Journals
T
Gfr)
22
1
2
40
55
81
101
110
3
5
7
8
134
11
14
15
155
161
196
22
24
37
204
243
62
164
293
395
Brookes calls his model the mixed Poisson model because the
derivation depends on a mix of Poisson random variables, In general
terms, the mix occurs as follows: for the sum XI + XZ + ...+ M we assume
74
LIBRARY TRENDS
G(r)
{ t;b
r = 1, 2, ..., c,
log r, r = c + 1, c + 2, ..., n.
Notice that for r = 1,2, ..., c the function is a curve, and for large values
the function is a straight line function of log r. To conform to Brookess
new model and other observed distributions, he now suggests two
hybrids, called Type I and T y p e 11, which he claims take the form:
logb [(a +
G(r)=
i ac-j)/a],
r = 1,2, ..., c
j =O
where b = (a+n)/a and LY < 1 for Type I and a > 1 for Type 11.
Graphically, these functions appear in figure 1, where hybrid Type I is
convex initially and hybrid T y p e I1is concave (with respect to the r-axis)
initially. The hybrids are consequences of his model and illustrate its
ability to adjust to anomalies.
In summary, the Brookes model is included in this article because
of its properties and its declared generality. To quote Brookes: The
main advantage of the model is that it shows how the log law, and
therefore how the hybrid forms of the Bradford law, can be derived in a
realistic and natural way from orthodox frequency statistics; and in
its present form it is the simplest possible stochastic model of the
Bradford law, but i t can easily be modified, for example, to embrace
SUMMER
1981
75
JOHN HUBERT
LIBRARY TRENDS
they include earlier models as special cases, and they are applicable to a
larger class of bibliometric variables. However, these general models are
limited in that they consider only the effect of one variable upon
another. Nature and life are not so simple. In fact, in bibliometrics,
recent articles have attempted to model one response variable as a
function of two or more variables. Also, on one source (journal, author,
etc.) more than one response variable has been measured. These two
approaches will change our definition of generality because such multivariate models will necessarily include the univariate models. It is a
simplistic viewpoint of reality to believe one variable in a social interactive process can be adequately predicted solely by one other variable. A
univariate model does not become more general by merely including
more parameters.
Examples of models of greater statistical sophistication can be
found: Bayesian models in interactive and retrieval systems,= methods
for evaluating article^?^ stochastic literature growth models,% modeling duration of book
measures of literature concentration using
the Whitworth model in frequency-rank distributions,n modeling relationships between title length and number of coauthors,= properties of
modeling,29and prediction models using time-series methods.%
This latest research differs from earlier work in bibliometrics in
that it uses models that are nonlinear and that consider the effect of
several variables, i.e., they are multivariate. These models require the
estimation of at least two parameters, whereas the simpler univariate
models required only one. The maximum likelihood method, the minimum chi-square method, and the ordinary linear least-squares method
have been used. However, estimation for nonlinear functions requires
care. If a model is linear and of the form Y = a + PX + e (where the
random variable e must have structure if confidence limits are to be
established), we speak of an additive model for the variable Y depending
on the variable X. If Y = a X p e , then this is an example of a multiplicative model. Taking logarithms on both sides, we have log Y = log a /3
log X + log t, which is of the form Y = a+ BX + e. We have linearized
the model where t = log e has a lognormal structure. For the nonlinear
model Y = ax 8+ t, taking logarithms yields log Y = log ( a X B + e), which
does not collapse into a linear form. This simple fact is often overlooked, and the estimation of parameters for such models requires
nonlinear estimation the01-y.~~
The use of multivariate models also requires greater care. If Y is
found to be functionally dependent on p variables XI, Xz,...,X,, suchas
Y = a +pIX1 p2& + ... ppXp+ 6 , then we have a multip!e regression
SUMMER
1981
77
JOHN HUBERT
model. If the response on a single subject is a set of variables Y1, ..., Y,,,
which may be correlated and are functionally dependent on a set of
variables XI, ..., X,, then we have a multivariate regression model. The
latter situation can utilize techniques such as cluster, factor and multivariate time-series analyses. Although recent articles in retrieval systems
are using time-series methodology, the simpler models listed earlier in
this article are not multivariate, and it should be possible to exploit
multivariate methods to achieve clarity and more generality.
Summary
References
1. Hubert, John J. Analysis of Data by a Rank-Frequency Model. Ph.D. diss.,
State University of New York at Buffalo, 1974; Brookes, Bertram C., and Griffiths, Jose M.
Frequency-Rank Distributions. JournaloftheASIS29(Jan. 1978):5-13;Hubert, John J.
Bibliometric Models for Journal Productivity. Social Indicators Research 4(0ct.
1977):441-73;and
. A Relationship Between Two Forms of Bradfords Law.
journal of the ASIS 29(Jan. 1978):159-61.
2. Simon, Herbert A. On a Class of Skew Distribution Functions. Biometrika
42(Dec. 1955):425-40; Brookes and Griffiths, Frequency-Rank Distributions; Price,
Derek de Solla. Little Science, Bag Science. New York: Columbia University Press, 1963;
Fairthorne, Robert A. Empirical Hyperbolic Distributions (Bradford-Zipf-Mandelbrot)
for Bibliometric Description and Prediction. Journal of Documentation 25(Dec.
1969):319-43;Brookes, Bertram C. Theory of the Bradford Law.Journal ojDocumentation 33(Sept. 1977):180-209; Bookstein, Abraham. The Bibliometric Distributions.
Library Quarterly 46(0ct. 1976):416-23; and Hubert, Bibliometric Models.
3. Bookstein, Abraham. Explanations of the Bibliometric Laws. Collection
Management S(Summer/Fall 1979):151-62.
4. Price, Derek de Solla. A General Theory of Bibliometric and Other Cumulative
Advantage Processes. Journal of the ASZS Z7(Sept.-Oct. 1976):292-506.
78
LIBRARY TRENDS
SUMMER
1981
79
JOHN HUBERT
Appendix
Articles Containing Models of Bibliometric Phenomena
Benford, Frank. The Law of Anomalous Numbers. Proceedingsof the American Philosophical Society 78( 1938):551-72.
Bookstein, Abraham. Patterns of Scientific Productivity and Social Change: A Discussion of Lotkas Law and Bibliometric Symmetry. Journal of the ASIS 28(July
1977):206-10.
Bradford. S.C. Sources of Information on Specific Subjects. Engineering 137 (26 Jan.
1934):85-86.
Brookes, Bertram C. The Derivation and Application of the Bradford-Zipf Distribution. Journal of Documentation 24(1968):247-65.
, and Griffiths, J.M. Frequency-Rank Distributions. Journal of the ASIS
29( 1978):5-13.
Cole, P.F. A New Look at Reference Scattering. Journal of Documentation 18(June
1962):58-64.
Goffman,William, and Newill, Vaun A. Generalization of Epidemic Theory; An Application to the Transmission of Ideas. Nature 204(17 Oct. 1964):225-28.
Good, I.J. Distribution of Word Frequencies. Nature 179(16 March 1957):595.
. The Population Frequencies of Species and the Estimation of Population
Parameters. Biometrika 40(Dec. 1953):237-64.
H a m s , Bernard. Determining Bounds on Integrals with Application to Cataloging
Problems. Annals of Mathematical Statistics 3O(June 1959):521-48.
. Statistical Inference in the Classical Occupancy Problem Unbiased Estimation of the Number of Classes. Journal of the ASZS 63(Sept. 1968):837-47.
Herdan, Gustav. Type-Token Mathematics: A Textbook of Mathematical Linguistics.
The Hague: Mouton, 1960, pp. 182-85.
Hubert, John J. Analysis of Data by a Rank-Frequency Model. Ph.D. diss., Dept. of
Statistics, SUNY-Buffalo, 1974.
Kendall, Maurice G . Natural Law in the Social Sciences.Journal of the Royal Statistical Society, Series B 124(1961):1-16.
Leimkuhler, Ferdinand. The Bradford Distribution. Journal of Documentation
23(Sept. 1967):197-207.
Loth, A.J. The Frequency Distribution of Scientific Productivity. Journal of the
Washington Academy of Sciences 16(1926):317-23.
Naranan, S. Power Law Relations in Science Bibliography-A Self-consistent Interpretation. Journal of Documentation 27(June 1971):83-97.
Pareto, Vilfredo. Cours dfkonomie Politique. Lausanne: F. Rouge k Cie., 1896.See esp.
vol. 2, Sec. 3.
Plackett, R.L. The Truncated Poisson Distribution. Biometrics 9(Dec. 1953):485-88.
Price, Derek de Solla. A General Theory of Bibliometric and Other Cumulative Advantage Processes. Journal of the ASIS 27(Sept.-Oct. 1976):292-306.
Rao, I.K. Ravichandra. The Distribution of Scientific Productivity and Social Change.
Journal of the ASZS Sl(March 198O):lll-22.
Resnikoff, H.L., and Dolby, J.L. Access: A Study of Znformataon Storage and Retrieval
with Emphasis on Library Information Systems (Final Report HEW Proj. 8-0548,
1972).
Simon, Herbert A. On a Class of Skew Distribution Functions. Biometrika 42(Dec.
1955):425-40.
Stevens, S.S. On the Psychophysical Law. Psychology Review 64(1957):153-81.
Vickery, B.C. Bradfords Law of Scattering. Journal of Documentation 4(Dec. 1948):
198-203.
80
LIBRARY TRENDS
SUMMER
1981
81
Citation Analysis
LINDA C. SMITH
Introduction
1981
83
LINDA SMITH
discussed
Bavelas suggests that the two extremesof this array of reasons might be
true scholarly impact at the one end (e.g., significant use of the cited
authors theory, paradigm, or method) and less-than-noble purposes at
the other (e.g., citing the journal editors work or plugging a friends
publications).0 Furthermore, it is possible that norms for citing vary
from discipline to discipline.
Just as there are a number of reasons why citations exist, there may
be a number of reasons why a citing author has not provided a link to
certain other documents. Although the most obvious reason is that a
prior document is not relevant to the present work, i t may also be due to
the fact that the author was not aware of the document, or could not
obtain it, or could not read the language in which it was published. As
Kochen observes: it is not surprising that there is a great deal of
arbitrariness in the way authors select references or their bibliographies. Undoubtedly, many documents which should have been citedare
missed; and many documents which the author does cite are only
slightly relevant.11
In spite of the uncertainties associated with the nature of the
citation relationship, citations are attractive subjects of study because
they are both unobtrusive and readily available. Unlike data obtained by
interview and questionnaire, citations are unobtrusive measures that do
84
LIBRARY TRENDS
Citation Analysis
not require the cooperation of a respondent and that do not themselves
contaminate the response (i.e., they are nonreactive).12Citations are
signposts left behind after information has been utilized and as such
provide data by which one may build pictures of user behavior without
ever confronting the user himself. Any set of documents containing
reference lists can provide the raw material for citation analysis, and
citation counts based on a given set of documents are precise and
objective.
Development of Citation Analysis
The development of citation analysis has been marked by the
invention of new techniques and measures, the exploitation of new
tools, and the study of different units of analysis. These trends have led
to a rapid growth in both the number and types of studies using citation
analysis.
The easiest technique to use is a citation count, determining how
many citations have been received by a given document or set of documents over a period of time from a particular set of citing documents.
When this count is applied to articlesappearing in a particular journal,
it can be refined by calculating the impact factor, the average number of
citations received by articles published in a journal during a specified
time period. This measure allows one to compare the impact of
journals which publish different numbers of articles. Pinski and Narin
have developed further refinements of citation counts which take into
account the length of papers, the prestige of the citing journal, and the
different referencing characteristics of different segments of the
li tera ture.13
Two techniques have been devised to identify documents likely to
be closely related: bibliographic coupling and cocitation ana1y~is.l~
Two documents are bibliographically coupled if their reference lists
share one or more of the same cited documents. T w o documents are
cocited when they are jointly cited in one or more subsequently published documents. Thus in cocitation earlier documents become linked
because they are later cited together; in bibliographic coupling later
documents become linked because they cite the same earlier documents.
The difference is that bibliographic coupling is an association intrinsic
to the documents (static), while cocitation is a linkage extrinsic to the
documents, and one that is valid only so long as they continue to be
cocited (dynamic).16The theory and practical applications of bibliographic coupling and cocitation analysis have been reviewed by Weinberg and Fkllardo, re~pective1y.l~
Citation counts and bibliographic
SUMMER
1981
85
LINDA SMITH
LIBRARY TRENDS
Citation Analysis
advocates of citation analysis recognize its limitations and exercise care
in its applications.23 Unfortunately, other investigators seem to be
Unaware of these limitations and misinterpret the results of theiranalyses. This section of the paper will enumerate both the assumptions
underlying citation analysis and the limitations of citation data, setting
the stage for the discussion of applications which follows.
Assumptions frequently underlying citation analysis are described
below, together with supporting evidence and/or counter-examples.
1. Citation of a document implies use of that document by the
citing author. This assumption actually has two parts: (1) the author
refers to all, or at least to the most important, documents used in the
preparation of his work; and (2) all documents listed were indeed used,
i.e., the author refers to a document only if that document has contributed to his work. Failure to meet these two conditions leads to sins of
omission and c o m m i s ~ i o n certain
: ~ ~ documents are underrated because
not all items used were cited, and other documents are overrated because
not all items cited were used. With respect to underrating, it should be
evident to anyone who has written a paper that citation does not
necessarily fullyand faithfullyreflect usage. Often whatiscitedisonlya
small percentage of what is read; not all that is read and found useful is
cited. Although the author usually does not provide any evidence of
omissions, there are exceptions. Consider a paper by Bottle which has as
its reference 29: Reference omitted to avoid embarrassing its author!25
With respect to overrating, Davies offers a fundamental law of reference giving: it is quite unnecessary to have read or even seen the
reference yourself before quoting it.26Without looking at the text of
both the citing and cited documents, i t may not be possible to make a
judgment as to whethera particularcitation doesindeed represent useof
material in the cited document.
2 . Citation of a document (author, journal, etc.) reflects the merit
(quality, significance, impact) of that document (author, journal, etc.).
The underlying assumption in the use of citation counts as quality
indicators is that thereis a high positivecorrelation between the number
of citations which a particular document (author, journal, etc.) receives
and the quality of that document (author, journal, e t ~ . )The
. ~ ~use of
citation analyses for evaluative purposes is the issue that has generated
the most discussion. While Bayer and Folger note that measures derived
from citation counts have high face validity,% Thorne argues that
citation counts have spurious validity because documents can be cited
for reasons irrelevant to their merit.29Nevertheless, this assumption has
been tested and has found support in a number of studies, including
SUMMER
1981
87
LINDA SMITH
LIBRARY TRENDS
Citation Analysis
1981
89
LINDA SMITH
90
LIBRARY TRENDS
Citation Analysis
this way is more closely related to the way citations are used by authors
in scientific ~ a p e r s . He
4 ~ notes that most citations are the authors own
private symbols for certain ideas he uses. Where documents are frequently cited, their use as concept symbols may be shared by a group of
scientists. Small has recently extended this approach through the development of cocitation context analysis.44Statements characterizing the
structure of a cocitation map are obtained from an analysis of the
contexts or passages in which documents are cocited.
The difficulty with such intellectual refinements is the time
required to apply them. Human judgment is needed to analyze citation
contexts and make inferences, so studies employing intellectual refinements are likely to be limited in scope. Nevertheless, both mechanical
and intellectual refinements offer alternatives to treating citations as
masses of undifferentiated units. Although for some applications it is
sufficient to treat citations equally, for others it is appropriate to investigate the fine structure of citation practice.
Given the difficulties with the assumptions which underly many
citation analyses, one must also be aware of the problems which can
exist in sources of citation data. Some of these problems are characteristic of all sources of citation data, while others only pose difficulties in
the use of secondary sources, the citation indexes. Cole and Cole discuss
many of these problems and ways of handling them in statistical analys ~ s Problems
. ~ ~
include:
1. Multiple authorship. Cited articles listed in the citation indexes include only the first-named authors. To find all citations to publications of a given author, including those in which he is not firstauthor, one needs a bibliography of his works so that all articles can be
checked in the citation index. Errors can be introduced unless such
complete counts are made.& There is also the problem of allocating
credit in multiauthored works.47Should such works be treated the
same as single-authored works in citation counts or should credit be
divided proportionally? Should one consider the sequence of author
names in allocating credit, as this sequence often is an indication of
the contribution of each author to the work reported?
2. Self-citations. If self-citations are to be eliminated from citation
counts, this is easily done for papers written by a single author.
Again, multiauthored papers may require further checking. An even
more difficult problem is to eliminate group self-citations, i.e., references from any member(s)of a research group toany other member(s)
of that research group. In this case one would have to find a source
identifying all members of the research group.
SUMMER
1981
91
LINDA SMITH
92
LIBRARY TRENDS
Citation Analysis
indexes as well. An exception is the A&HCI, which includes implicit
citations when an article refers to and substantially discusses a work
but fails to include an explicit ~ i t a t i o n . ~But
' implicit citations are
also frequently found in the form of eponyms in the scientific literature. Furthermore, papers containing important ideas will not necessarily continue to be highly cited. Once an idea is sufficiently widely
known, citing the original version is unnecessary. If one were using
citation analysis to measure the impact of an individual author, such
implicit citations would fail to be included.
7 . Fluctuations with time. There may be large variations in citation
counts from one year to another, so citation data should not be too restricted in time.
8. Field variations. Citation rates (citations per publication) vary greatly
in different fields, leading to difficulties in cross-discipline comparisons. Bates has proposed the criterion rate as a refinement of citation
rate, because citation counts as a measure of the quality of a
researcher's work are influenced not only by the inherent value of
that work, but also by the size of the pool of available citers in a given
field.%A researcher's work can be evaluated in relation to a criterion
rate of citation, the citation rate of the top researchers in that field.
9. Errors. Of course, citation analyses, including those based on citation
indexes, can be no more accurate than the raw material used.
Although processing of citations for inclusion in citation indexes
may introduce some errors while eliminating others, many errors due
to citing authors remain. These can include errors in cited author
names, journal title, page, volume, and year. The incorrect citing of
sources is unfortunately far from uncommon. Two studies found the
percentage of error for citations from various journals to range from
10.7 to 50 percent.54
This section has considered two types of limitations which can
affect citation analyses: the assumptions made may not be true, and the
data collected may have inadequacies. Invalid conclusions will be made
unless these limitations are taken into account in the design of a study
and in the interpretation of results. The most reliable results may be
expected when citation abuses and errors appear as noise under conditions of high signal to noise ratio, i.e., the noise represents only a
relatively small number of the citations analyzed.55The limitations of
citation analysis do not negate its value as a research method when used
with care. There are, in fact, several application areas where citation
analysis has been used successfully.
SUMMER
1981
93
LINDA SMITH
Applications
The applications described in this section reflect two major
themes-use of citations as tools for the librarian and use of citations as
tools to analyze research activity. Citations and cocitations are part of
the range of empirical data available to historians and sociologists of
science, as well as to librarians. For each application area, representative
studies are mentioned to illustrate the types of questions which have
been investigated through citation analysis. In addition, weaknesses of
the method are identified, reflecting points made in the critique above.
1. Literature of studies. In this case one looks at citations in a
particular subject area to describe patterns of citation. The sources of
citation data may be as limited as a single journal in the field (e.g.,
#ens study of references in articles appearing in the Bulletin of the
Medical Library Association56),or they may encompass many sources,
including types of material in addition to journals. Characteristics of
cited materials frequently examined include types, age, highly cited
authors and journals, languages and countries of origin, and subject
d i s t r i b ~ t i o n s This
. ~ ~ type of study may also look for changes, in these
characteristics over time. A major problem with these studies is their
lack of compatibility which makes comparisons and synthesis difficult.
One application which has been suggested for this type of study is the
definition of appropriate secondary service coverage and scope of retrospective bibliographies in a given subject area.= By studying the range
of subjects, countries, languages, and document forms referred to by a
group of known core sources, one can begin to establish the boundaries
of a subject literature, with the limitation thatcitationsdonot reflect all
literature use. The value of this method in the determination of current
policies is a function of the extent to which these data can be projected
forward in time. Bibliographic coupling and cocitation have been used
to create mappings of the micro- and macrostructures and relationships
of discipline^.^^ Small, for example, has used cocitation analysis to
explore the relationship of information science to the social sciences.60
2.Type of literature studies. Citation analysis can be used to
gauge the dissemination of results reported in certain types of literature,
such as government documents, dissertations, or the exchange literature
of regional scientific societies.61The source of citations used for analysis
clearly can determine the generality of ones conclusions in this type of
study. Nelson, in a study of citations to art collection catalogs, remarks
that one must recognize the potential usefulness of what she terms
self-styled citation methodsa2In her case, citation analysis of the fine
arts nonserial literature was the appropriate approach. Such studies can
involve content analysis, documenting not only where but also how
certain types of literature have been used.
94
LIBRARY TRENDS
Citation Analysis
1981
95
LINDA SMITH
96
LIBRARY TRENDS
Citation Analysis
79
1981
97
LINDA SMITH
98
LIBRARY TRENDS
Citation Annlysis
citation analysis is to apply multiple methods in the study of a phenomenon, as in the coupling of citation analysis and contentanalysis. As no
research method is without bias, citation analysis should be supplemented by methods testing the same variables but having different
methodological weaknesses. For example, to investigate communication patterns among scientists, one could supplement citation data with
those obtained via interview or questionnaire.
Not enough is known about the citation behavior of authorswhy the author makes citations, why he makes his particular citations,
and how they reflect or do not reflect his actual research and use of the
literature. When more is learned about the actual norms and practices
involved, we will be in a better position to know whether (and it what
ways) it makes sense to use citation analysis in various application
areas.91 It would also be interesting to study in more detail the characteristics of documents which do not cite and/or are not cited, and to
identify characteristics of documents which can be used to predict
citednes~.~~
Advances in theory and practice have marked the development of
citation analysis, and researchers are likely to continue contributing in
both these areas. Gilbert, for example, has proposed a theory of citing
which views referencing as persuasion.93 In practice, simple citation
counts have been supplemented by bibliographic coupling, cocitation
analysis, evaluative bibliometrics, and cocitation context analysis. Garfield recently noted that one of the major methodological changes in his
studies in the near future will be to shift from counting citations to
counting authors influenced by.91
To conclude this paper, two questions affecting the future of citation analysis will be posed. Is i t possible that increased use of citation
analysis will cause a change in citation behavior? How will citation
behavior be affected by the increasedbse of electronic media for generation, storage and dissemination of information? Although both questions have already received some attention in the literature, the
responses to them are necessarily somewhat speculative.
It has been suggested that the very existence of citation indexes and
the growing abundance of citation analyses will likely have various
feedback influences on the writing and citing habits of future authors.%
Just as authors may title their papers more carefully to ensure their
retrievability through keyword indexes, authors could be motivated to
acknowledge their intellectual debts to prior documents accurately, lest
their papers go undetected by the user of a citation index. Thus this
paper is titled Citation Analysis rather than the more metaphorical
Standing on the Shoulders of Giants, and care has been taken to
SUMMER
1981
99
LINDA SMITH
LIBRARY TRENDS
Citation Analysis
References
1 . Isaac Newton. Quoted in Robert K. Merton. On the Shoulders of Giants: A
Shandean Postscript. New York: Free Press, 1965.
2. Ziman, John M. Public Knowledge: A n Essay Concerning the Social Dimension
of Science. Cambridge: Cambridge LJniversity Press, 1968, p. 58.
3. Narin, Francis. et al. Evaluative Bibliometrics: T h e Use of Publication and Citation Analysis in the Evaluation of Scientific Activity. Chemy Hill, N. J.: Computer
Horizons, Inc., 1976, pp. 334, 337. (PB 252 339)
4. Malin, Morton V. The Science Citation Index :A New Concept in Indexing.
Library Trends 16(Jan. 1968):376.
5. Gupta, B.M., and Nagpal. M.P.K. Citation Analysisand Its Applications: A Review. Herald of Library Science 18(Jan.-April 1979):86-93:Hall, Angela M. T h e Useand
Value of Citations: A State-of-the-Art Report. London: Information Service in Physics,
Electrotrchnology and Control, 1970 (R70/4); Hjerppe, Roland. An Outline of Bibliometrics and Citation Analysis (TRITA-LIB-6014). Stockholm: Royal Institute of Technology Library, 1978. (ED 167 077); Martyn, John. Citation Analysis. Journal of
Documentation 31(Dec. 1975):290-97; Miller, Elizabeth, and Truesdell, Eugenia. Citation Indexing: History and Applications. Drexel Library Quarterly 8(April 1972):15972; and Mitra, A.C. The Bibliographical Reference: A Review of Its Role. Annals of
Library Science and Documentation 17(.Sept.-Dec. 1970):117-23.
6. Hjerppe, Roland. A Bibliography of Bibliornetrics and Citation Indexing &
Analysis (TRITA-LIB-2013). Stockholm: Royal Institute of Technology Library, 1980.
7. Garfield, Eugene. Citation Indexing-Its Theory and Application in Science,
Technology, and Humanities. New York: Wiley, 1979.
. Essays of a n Znformation Scientist, 3 vols. Philadelphia: Institute
8.
for Scientific Information Press, 1977, 1980.
9.
. Can Citation Indexing Be Automated? In Statistical Association
Methods for Mechanized Documentation (NBS Misc. Pub. 269),edited by Mary E. Stevens,
et al., p. 189. Washington, D.C.: National Bureau of Standards, 1965.
10. Bavelas, Janet B. The Social Psychology of Citations. Canadian Psychological
Review lS(Apri1 1978):lO.
11. Kochen, Manfred. Principles of Information Retrieval. L o s Angeles: Melville,
1974, p. 74.
12. Webb, Eugene J.. et al. Unobtrusive Measures: Nonreactive Research in the
Soczal Sciences. Chicago: Rand McNally, 1966.
13. Pinski, Gabriel, and Narin, Francis. Citation Influence for Journal Aggregates
of Scientific Publications. Information Processing and Management 12(1976):297-312.
14. Kessler, M.M. An Experimental Study of Bibliographic Coupling Between
Technical Papers. ZEEE Transactions on Znformation Theory IT-9(Jan. 1963):49-51;
. Bibliographic Coupling Between Scientific Papers. American Docuand
mentation 14(Jan. 1963):lO-25.
15. Marshakova, I.V. ASystemof Document LinksConstmctedon the Basisof Citations (According to the Science Citation Index). Automatic Documentation and
Mathematical Linguistics 7( 1973):49-57. (English translation of article in Nauchno
Tekhnicheskaya Informatsiya Seriya 2, no. 6 , pp. 3-8, 1973); and Small, Henry. CoCitation in the Scientific Literature: A New Measure of the Relationship Between Two
Documents. Journal of the ASZS 24(July-Aug. 1973):265-69.
16. Garfield, Eugene, et al. Citation Data as Science Indicators. In Toward a
SUMMER
1981
101
LINDA SMITH
Metric of Science: The Advent of Science Indicators, edited by Yehuda Elkana, et al., p.
185. New York: Wiley, 1978.
17. Weinberg, Bella H. Bibliographic Coupling: A Review. Znformation Storage
and Retrieval 10(May-June 1974):189-96;and Bellardo, Trudi. The Use of Co-Citations
to Study Science. Library Research 2(Fall 1980):231-37.
18. Small, Henry, and Griffith, Belver C. The Structure of Scientific Literatures. I:
Identifying and Graphing Specialties. Science Studies 4(Jan 1974):17-40.
19. Weinstock, Melvin. Citation Indexes. In Encyclopedia of Library and Information Science, vol. 5, edited by Allen Kent, et al., pp. 16-40. New York: Marcel Dekker,
1971.
20. Garfield, Eugene. The New IS1 Journal Citation Reports Should Significantly
Affect the Future Course of Scientific Publication. In
, Essays, vol. 1, pp.
473-74.
21. Small, Henry, and Greenlee, Edwin. A Citation and Publication Analysisof U S .
Zndustrial Organimtions. Philadelphia: Institute for Scientific Information, 1979.
22. Abt, Helmut A. The Cost-Eftectivenessin Terms of Publications and Citations
of Various Optical Telescopesat the Kitt Peak National Observatory.Publicationsof the
Astronomical Society of the Pacific 92(June 1980):249-54.
23. Griffith, Belver C., et al. On the Useof Citations in Studying Scientific Achievements and Communication. Society for Social Studies of Science Newsletter P(Summer
1977):9-13; and Garfield, Citation Indexing, pp. 240-52.
24. Foskett, Anthony C. The Subject Approach to Information. 3d ed. Hamden,
Conn.: Linnet Books, 1977, p. 52.
25. Bottle, R.T. Information Obtainable from Analyses of Scientific Bibliographies. Library Trends 22(July 1973):71.
26. Davies, David. Citation Idiosyncrasies, letter to the editor. Nature 228(26 Dec.
1970):1356.
27. Edwards, Shirley A., and McCamey, Michael W. Measuring the Performance of
Researchers. Research Management 16(Jan. 1973):34-41.
28. Bayer, Alan E.,and Folger, John. Some Correlatesof a Citation Measureof Productivity in Science. Sociology of Education 39(Fall 1966):381.
29. Thorne, Frederick C. The Citation Index: Another Case of Spurious Validity.
Journal of Clinical Psychology 33(0ct. 1977):1157-61.
30. Virgo, Julie A. A Statistical Procedure for Evaluating the Importance of Scientific Papers. Library Quarterly 47(0ct. 1977):415-30;McAllister, Paul R., et al. Comparison of Peer and Citation Assessment of the Influence of Scientific Journals. Journal of
the ASIS 31(May 1980):147-52;and Smith, Richard, and Fiedler, Fred E. The Measwement of Scholarly Work: A Critical Review of the Literature. Educational Record
52(Summer 1971):225-32.
31. Soper, Mary E. Characteristics and Use of Personal Collections. Library Quarterly 46(Oct 1976):397-415.
32. Coodell, Rae. The Visible Scientists. Boston: Little, Brown, 1977, p. 4.
33. Barlup, Janet. Relevancy of Cited Articles in Citation Indexing.Bulletin of the
Medical Library Association 57(July 1969):260-63.
34. Garfield, Eugene. Citation Indexes for Science. Science 122(15 July 1955):108.
35. Martyn, John. Bibliogmphic Coupling. Journal of Documentation 20(Dec.
1964):236.
36. Bertram, Sheila J.K. The Relationship Between Intra-Document Citation Location and Citation Level. Ph.D. diss., University of Illinois at Urbana-Champaign. 1970.
37. Herlach, Geruud. Can Retrieval of Information From Citation Indexes Be Simplified? Journul of fhe ASZS 29(Nov. 1978):308-10.
38. Voos, Henry,and Dagaev, Katherine S. Are All Citations Equal? Or, Did We Op.
Cit. Your Zdcm? Journal of Academic Librarianship l(Jan. 1976):19-21.
39. Tagliacouo, Renata. Self-Citations in Scientific Literature. Journal of Documentation 33(Dec. 1977):251-65.
102
LIBRARY TRENDS
Citation Analysis
40. Small, Henry G. Cited Documents as Concept Symbols. Social Studies of
Science 8(Aug. 1978):?27.
41. Lipetz, B e n - h i . Improvement of the Selectivity of Citation Indexes to Science
Literature Through Inclusion of Citation Relationship Indicators. American Documentation 16(April 1965):81-90.
42. Chubin, Daryl E., and Moitra, Soumyo D. Content Analysis of References: Adjunct or Alternative to Citation Counting? SocialStudiesofScience5(Nov.
1975):423-41;
Frost, Carolyn 0.The Use of Citations in Literary Research A Preliminary Classification of Citation Functions.Library Quarterly 49(0ct. 1979):399-414;Moravcsik, Michael
J.: and Murugesan, Poovanalingam. Some Results on the Function and Quality of
Citations. Social Studies of Science 5(Feb. 1975):86-92;Murugesan, Poovanalingam, and
Moravcsik, Michael J. Variations of the Nature of Citation Measures with Journals and
Scientific Specialties. Journal of the ASIS 29(May 1978):141-47;Oppenheim, Charles,
and Renn, Susan P. Highly Cited Old Papers and the Reasons Why They Continue to he
Cited. Journal of the ASIS 29(Sept. 1978):225-31;and Spiegel-Rosing, Ina. Science
Studies: Bibliometric and Content Analysis. Social Studies of Science 7(Feb. 1977):97113.
43. Small, Cited Documents, p. 328.
44. Small, Henry G. Co-Citation Context Analysis. Proceedings of the ASIS Annual Meeting 16(1979):270-75.
45. Cole, Jonathan, and Cole, Stephen. Measuring the Quality of Sociological Research: Problems in the LJseof the Science Citation Index. American Sociologist 6(Feh.
1971 ):23-29.
46. Long, J. Scott, c t al. The Problcm of Junior-Authored Papers in Constructing
Citation Counts. Social Studies of Science 10(May 1980):127-43.
47. Lindsey, Duncan. Production and Citation Measures in the Sociology of
Science: The Problem of Multiple Authorship. Social Studies of Science 10(May
1980):145-62.
48. Garfield, Eugene. Whatsin a Surname? Current Contents 13(16Feb. 1981):5-9.
49. Line, Maurice B. The Influence of the Type of Sources Used on the Results of
Citation Analyses. Journal of Documentation 35(Dec. 1979):265-84.
50. Oromaner, Mark J . The Audienre as a Determinant of the Most Irnportant
Sociologists. American Sociologist 4(Nov. 1969):332-35.
51. Brittain, J. Michael, and Line, Maurice B. Sources of Citations and References
for Analysis Purposes: A Comparative Assessment. Journal ofhcumentation 29(March
1973):72-80.
52. Garfield, Eugene. Will ISIs Arts Q Humanities Citation Index Revolutionize
, ESSUYS,~ 0 13. , pp. 204-08.
Scholarship? In
53. Bates, Marcia J. A Criterion Citation Rate for Information Scientists.Proceedings of the ASZS Annual Meeting 17(1980):276-78.
54. Boyce, Bert R., and Banning, Carolyn S. Data Accuracy in Citation Studies.
R Q 18(Summer 1979):349-50;and Goodrich, June E., and Roland, Charles G. Accuracy
of Published Medical Reference Citations. Journal of Technical Writing and Communication 7(1977):15-19.
55. Cawkell, A.E. Citations as Sociological and Scientific Indicators-A Review.
In EURIM 11: A European Conference on the Application of Research in Information
Seruices and Libraries, edited by W.E. Batten, pp. 31-39. London: Aslib, 1977.
56. Chen, Ching-Chih. A Citation Analysis of the Bulletin of the Medical Library
Association. Bulletin of the Medical Library Association 65(April 1977):BO-92.
57. Friis, Th. The Use of Citation Analysis as a Research Technique and Its Implications for Libraries. South African Libraries 23(July 1955):12-15.
58. Nicholas, David, and Ritchie, Maureen. Literature and Biblzometrics. Hamden,
Conn.: Linnet Books, 1978.
59. Griffith, k l v e r C., et al. The Structure of Scientific Literatures. 11: Toward a
Macro- and Microstructure for Science. Science Studies 4(0ct. 1974):339-65.
SUMMER
1981
103
LINDA S M I T H
60. Small, Henry. The Relationship of Information Science to the Social Sciences:
A &-Citation Analysis. Information Processing and Management 17(1981):39-50.
61. Gwhlert, Robert. A Citation Analysis of International Organization: The Lise
of Government Documents. Government Publications Review 6( 1979):185-93;OConnor, Mary A. Dissemination and Use of Library Science Dissertations in the Periodicals
Indexed in the Social Sciences Citation Index.Ph.D. diss., Florida State [Jniversity, 1978;
and Gibson, Sarah S. Some Characteristics of the Exchange Literature of Regional
Scientific Societies. Library Research 2(Spring 1980-81):75-81.
62. Nelson, Diane M. Methods of Citation Analysis in the Fine Arts. Special Libraries 68(Nov. 1977):39@95.
63. Mancall, Jacqueline C., and Drott, M. Carl. Materials LJsed by High School
Students Preparing Independent Study Projects: A Bibliometric Approach. Library
Research 1(Fall 1979):223-36;Popovich, Charles J. The Characteristics of a Collection
for Research in BusinessIManagement. College & Research Libraries 39(March
1978):llO-17; and Hockings, E.F. Selection of Scientific Periodicals in a n Industrial
Research Library. Journal of the ASIS 25(March-April 1974):131-32.
64. Waldhart, Thomas J. Utility of Scientific Research: The Engineers Use of the
Products of Science. IEEE Transacfions on Professional Communicafion PC-17(June
1974):33-35;and Culnan, Mary J. An Analysis of the Information Usage Patterns of
Academics and Practitioners in the Computer Field. Information Processing and Management 14(1978):395-404.
65. GarfieId, Citation Indexing, p. 81.
66. Smith, Linda C. Memex as an Image of Potentiality in Information Retrieval
Research and Development. In Information Retrieval Research. London: Butterworths,
1981; and Ruff, Imre. Citation Analysis of a Scientific Career. Social Studies of Science
9(Feb. 1979):81-90.
67. Ellis, P., et al. Studies on Patent Citation Networks. Journal of Documenfation 34(March 1978):lZ-20.
68. Small, Henry G. Structural Dynamirs of Scientific Literature. International
Classification 3( 1976):67-74.
. A Co-Citation Model of a Scientific Specialty. Social Studies of
69.
Science 7(May 1977):189-66.
- A &-Citation
.
Context Analysis and the Structure of Paradigms.
70. ~
Journal of Documentation 36(Sept. 1980):183-96.
71. Shepherd, Robert G., and C d e . Erich. Scientists in the Popular Press.New
Scientist 76(24 Nov. 1977):482-84.
72. Narin, Evaluative Bibliometrics, p. 334.
73. Aaronson, Steve. The Footnotesof Science. Mosaic G(March/April 1975):22-27;
and Wade, Nicholas. Citation Analysis: A New Tool for Science Administrators.
Science 188(2 May 1975):429-32.
74. Salton, Gerard. Associative Document Retrieval Techniques Using Bibliographic Information. Journal of the ACM lO(0ct. 1963):440-57.
75. Gray, W.A., and Harley, A.J. Computer Assisted Indexing. Information
Storage and Retrieval ~ ( N o v1971):167-74;
.
Kwok, K.1,. The UseofTitleandCitedTitles
as Document Representation for Automatic Classification. Information Processing and
Management 11(1975):201-06;Price, Nancy, and Schiminovich, Samuel. A Clustering
Experiment. Information Storage and Retrieval 4(Aug. 1968):271-80; Schiminovich,
Samuel. Automatic Classification and Retrieval of Documents by Means of a Bibliographic Pattern Discovery Algorithm. Information Storage and Relrieual 6(May
1971):417-35;Bichteler, Julie, and Parsons, Ronald G. Document Retrieval by Means of
an Automatic Classification Algorithm for Citations. Information Storageand Retrieval
10(July/Aug. 1974):267-78; Birhteler, Julie, and Eaton, Edward A. Comparing Two
Algorithms for Document Retrieval Using Citation Links. Journal of the ASZS 28(July
1977):192-95;and
. The Combined Use of Bibliographic Coupling and
Cocitation for Document Retrieval. Journal of the ASIS 31( July 1980):278-82.
104
LIBRARY TRENDS
Citation Analysis
76. Yermish, Ira. A Citation-Based Interactive Associative Information Retrieval
System. Ph.D. diss., University of Pennsylvania, 1975.
77. Chapman, Janet, and Subramanyarn, K. Cocitation Search Strategy.
In National Online Meeting Proceedings-1981, compiled by Martha E. Williams and
Thomas H. Hogan, pp. 97-102. Medford, N.J.: Learned Information, 1981; and White,
Howard D. Cocited Author Retrieval Online: An Experiment with the Social Indicators
Literature. Journal of the ASIS 32(Jan. 1981):16-21.
78. Garfield, Eugene. ISIs On-line System Makes Searching So Easy Even a Scientist Can Do It. Current Contents 13(26 Jan. 1981):5-8.
79. OConnor, John. Citing Statements: Recognition by Computer and Use to
Improve Retrieval. Proceedings of the ASIS Annual Meeting 17(1980):177-79.
80. Cayless, C.F. Journal Ranking and Selection, letter to the editor.
Journal of Documentation 33(Sept 1977):243.
81. Gross, P.L.K., and Gross, E.M. College Libraries and Chemical Education.
Science 66(28 Oct. 1927):385-89;and Garfield, Eugene. Citation Analysis as a Tool in
Journal Evaluation. Science 178(3 Nov. 1972):471-79.
82. Brodman, Estelle. Choosing Physiology Journals. Bulletin of the Medical
Library Association 32(0ct. 1944):479-83.
83. Pritchard, Alan. Citation Analysis vs. Use Data, letter to the editor. Journal of
Documentation 36(Sept. 1980):268-69.
84. Singleton, Alan. Journal Ranking and Selection: A Review in Physics. Journal
of Documentation 32(Dw. 1976):258-89.
85. Line, Maurice B., and Sandison, Alexander. Practical Interpretation of
Citation and Library Use Studies. College dr Research Libraries 36(Sept. 1975):393-96.
86. Line, Maurice B. On the Irrelevance of Citation Analyses to Practical Librarianship. In EURIM I I , pp. 51-53; and
. Ranked Lists Based on
Citations and Library Uses as Indicators of Journal Usage in Individual Libraries.
Collection Management 2(Winter 1978):313-16.
87. Broadus, Robert N. The Applications of Citation Analyses to Library Collection Building. Advances in Librarianship 7( 1977):328.
88. Dhawan, S.M.. et al. Selection of Scientific Journals: A Model.
Journal of Documentation %(March 1980):24-32.
89. Kriz, Harry M. Subscriptions vs. Books in a Constant Dollar
Budget. College dr Research Libraries 39(March 1978):105-09.
90. See Bavelas, Janet B. Comments on BusssEvaluation of Canadian Psychology
Departments. Canadian Psychological Review 17(Oct.1976):303;and Kaplan, Abraham.
The Conduct of Inquiry: Methodology for Behavioral Science. San Francisco: Chandler,
1964, p. 28.
91. Kaplan, Norman. The Norms of Citation Behavior: Prolegomena to the
Footnote.American Documentation 16(July 1965):179-84.
92. Ghosh, Jata S.,and Neufeld, M. Lynne. Uncitednessof Articles in the Journal of
the American Chemical Society. Information Storage and Retrieval lO(Nov./Ikc.
1974):365-69; Ghosh, Jata S. Uncitedness of Articles in Nature, A Multidisciplinary
Scientific Journal. Information Processing and Management 1I( 1975):165-69;Garfield,
, Essays,
Eugene. Uncitedness 111-The Importance of Not Being Cited. In
vol. 1, pp. 413-14; and Kuch, T.D.C. Predicting the Citedness of Sientific Papers:
Objective Correlates of Citedness in the American Journal of Physzology.Proceedings of
the ASIS Annual Meeting 15(1978):185-87.
93. Gilbert, G. Nigel. Referencing as Persuasion. Social Studies of Science 7(Feb.
1977):113-22.
94. Garfield, Eugene. Is Information Retrieval in the Arts and Humanities
Inherently Different From That in Science? Library Quarterly 50(Jan. 1980):56.
95. Margolis, J. Citation Indexing and Evaluation of Scientific Papers.
Science 155(10 March 1967):1213-19.
96. Price, Derek J. de Solla. Ethics of Scientific Publication. Science
144(8 May 1964):655-57.
SUMMER
1981
105
LINDA SMITH
106
LIBRARY TRENDS
Obsolescence
D. KAYE GAPEN
SIGRID P. MILNER
OBSOLESCENCE
HAS BEEN DEFINED by Line and Sandison as the decline
over time in validity or utility of information. This concept is of
obvious interest to information theoreticians who concern themselves
with the development, career and eventual death or incorporation of
particular kinds of information. But i t is also of interest to practical
librarians who administer growing collections in finite spaces. Such
librarians look to research on obsolescence to help them decide which
items to keep and which to store or discard in order to make room for
new acquisitions. Ideally for remote storage or discarding, research on
obsolescence would culminate in simple mathematical formulas which
could be applied with equal success to any and all libraries. Obsolescence research has produced many mathematical formulas, but unfortunately they have been neither simple nor universally applicable. The
best researchers are the ones who have admitted that obsolescence is a far
more complicated and more hypothetical concept than we have hoped.
Only that research which has been transmogrified into bibliofolklore-journals can be discarded after seven years, everyone
knows chemistry books become obsolete more slowly than physics
books-is simple, and it is generally incorrect as well, either inexpression or application.
The concept of obsolescence has itself suffered a decline in fashion
such as may be responsible for apparent obsolescence of information in
D. b y e Gapen is Dean, University Library, University of Alabama, Tuscaloosa, and
Sigrid P. Milner is Personnel Intern, Iowa State University Library, Ames.
SUMMER
1981
107
KAYE GAPEN
&
SIGRID MILNER
LIBRARY TRENDS
0bsolescence
cal results still leave working librarians with the problem of determining which individual volumes are not being used, a problem not
necessarily made easier by increasing automation of the circulation
system. But initially, the decisions of which volumes to store or discard
were made qualitatively by experts, either faculty members or specialist
librarians. Given the effect of storage upon use, the selections became a
self-fulfilling prophecy. Stored on the assumption that they would be
less used, they were less used-perhaps because of their uselesness,
perhaps because of the deterrent effect of their storage.
Some recent literature has attempted to reproduce the judgments of
experts through mechanical or formulaic means without paying too
much attention to the actual validity of the judgments. Fussler and
Simon, for example, found that by analyzing functions of past use,
publication date, and language, they could achieve almost unanimous
agreement with the faculty experts in chemistry and economics.' Past
use was an especially significant predictor of future use. But in English
literature and Germanic literature, there was great disagreement
between the experts' opinion and any of the functions. It is a little hard
to see why this is true, if in fact scientists use chiefly more recent material
which would have no past use, while scholars in the humanities use
chiefly older material with a much longer history of use; yet none of the
three factors was an accurate predictor of use. Seymour concluded that
although weeding by means of past circulation was most efficient, it was
also disproportionately most costly because of gathering the data and
changing the individual records. Weeding by publication date or age
was least efficient because some heavily used books were stored; yet
because of the ease of implementation, this method may be the most
cost-effective. A two-tiered system might become possible with such a
weeding program, and indeed might be informally put into effect by
alert pagers: the most frequently recalled stored volumes might be left in
a particular area or on a shelf more easily accessible than the general
storage area. It is unfortunate that academic libraries are not more
committed to continuous derivation of use data about their collections.
A great deal of such data could be easily gathered through the automated
circulation systems many universities now have, and would provide
practical grist for the theoretical mill. Unfortunately, too many automated systems were brought up without much concern for their research
possibilities.
In the second part of her article, Seymour pointed out that serials,
being a different format from monographs, also had a different useespecially greater in-house use. One of the biggest problems in the body
of literature about obsolescence is how to deal with in-house use. Some
SUMMER
1981
109
KAYE GAPEN
&
SIGRID MILNER
studies have shown that in-house use is similar to, but greater &an,
circulation. This finding will be discussed later, but even if we accept it
at face value here, it does not solve the problem for the many libraries
with noncirculating periodicals. The research has relied chiefly on
citation data to identify individual volumes or entire runs of journals
for relegation to storage. As Sandison has pointed out, citation data do
not refer to any particular library; therefore, they do not shed light on
local use patterns or local user populations. Studies by publication date,
language, number of libraries holding the serial, position on ranked
lists, and other functions demonstrate that past use is again the best
predictor of future use. Fussler and Simon have detected a family
quality in volumes of a serial.This means that the use patterns of the
entire serial set are alike, and the whole run should be stored or retained.
It is not clear how the effect, if any, of various kinds of special issuesthe annual bibliographic issue, for example, or a single-theme issuewas allowed for, or what effect reprinting and photocopying have on
journal use, Researchers have devised a half-life value for scientific
journal articles. As Seymour pointed out, it might better be termed the
median citation age, since it represents the point at which half of all the
citations to an article which are going to be made have been made. The
use of this figure is not immediately apparent, since one would not wish
to discard or store a volume which had half its useful life still ahead. No
judgment can be made as to whether the first half or the second half of
the citations is more valuable; only that the first half is likely to come
more quickly. Some researchers believe that all journals older than a
certain date should be stored, while others find storage of entire runs
better, particularly if subscriptions have been canceled.
A second review article, by Line and Sandison, strikes at the heart of
some easily made assumptions about obsoles~ence.~
They discuss a
number of reasons for changes in the use of literature over time. The
information which the literature contains may be invalid, or may be
valid but incorporated in or superseded by later work. Most interesting
of all is the case where information is valid but in a field of declining
interest or fashionableness. In each of these cases, the literature will
experience a decline in use. Much of the literature will still be of interest
to the historian of the field, even if it contains invalid information, but
use of the information qua information will decrease. In some cases, use
of literature can increase. For example, if the information was formerly
considered invalid but is later recognized as valid, if a lag in technology
or theory delays exploitation of valid information (as was the case with
movable type, for instance), or if the information is valid and in a field of
increasing interest or fashionableness, then in each of these cases the
110
LIBRARY TRENDS
0bsolescence
literature will experience an increase of use. Too many researchershave
ignored the interplay of these complex factors and settled for a simple
model of linear or exponential obsolescence.
A further theoretical problem which Line and Sandison brought
out is that although information and knowledge are recorded and
communicated in documents, the relationship between document use
and information validity is by no means a direct one. A document which
is difficult to obtain may be less used although the information is
potentially useful. They stated definitely that what has been considered
the law of obsolescence-decline of use over time-is in fact nothing
more than a hypothesis still to be tested. Apparent obsolescencemay be
due to a number of irrelevant factors. Literature can be used in two
different ways: for current awareness and for a basic search on some
particular topic. Obviously new literature, and perhaps especially new
journals of a particular type, will be used for both these purposes. Older
literature and archival journals will be usedchiefly in the second way.
This differentiation in type of use might account for part of the obsolescence curve. The growth of literature also could affect the results.
One way in which literature has grown is in the tremendous increase in
number of publications. So many more monographs and journals are
being published now that even if the percentage that was being used
were no greater, the absolute number would be many times greater.
Other possible factors are the increase in number of journal articles per
issue, length of article or monograph, number of footnote citations or
references per article or monograph. It appears that no researcher has
attempted to come up with a statistical corrective to any bias which these
factors might introduce. One study suggested that it would be possible
to subtract literature growth (discovered by counting articles) from
apparent increase in use of more recent literature, thus deriving actual
increase, but did not actually do such a computation. In any case,
merely counting articles would probably not result in a sophisticated
adjustment factor.
The relationship between citations or references and use is another
uncertainty. Thesis advisers have long been aware of the purely ceremonial reference, made to a venerable but unused source. Similarly,
some sources are actually used in the production of research articles but
are not cited because of editorial restrictions or unwillingness to indicate indebtedness to such a source. Some uses of current-awareness tools
may lead only indirectly or not at all to research results; yet who is to say
that published research is the only useto which information can be
validly put? Journals dealing with the teaching of a particular university subject might only rarely be cited i n core journals, but they might
S U M M E R 1981
111
KAYE CAPEN
8C
SIGRID MILNER
be read and acted upon by many. This, of course, gets at the fundamental question, What do we mean by use?
A final basic point raised by Sandison and Line is the often ignored
distinction between synchronous and diachronous use studies. Most
studies are synchronous, since diachronous ones are time-consuming
and difficult to do; but researchers have shown that synchronous and
diachronous results need not be the same, and that in certain cases they
are markedly different. Synchronous studies are those which compare
use at a particular time to the age of the items. They might, for instance,
plot the publication dates of all items charged out from a libraryduring
a particular period, even a lengthy period as was done in the University
of Pittsburgh study. Or they might analyze the publication dates of cited
sources for serial articles in a given year or years. Basically, such studies
look backward from a point in present time. But what we are interested
in for weeding is the use that individual titles will receive in the future.
Here a diachronous study is necessary, one which follows particular
books or articles through their useful life span. Ideally, a study like this
would trace an entire collection through its total uses, or rigorous
sampling methods could authenticate less comprehensive studies. In
practice, diachronous studies tend to be like the Fussler and Simon
study which compared the use of particular books in two five-year time
periods. A diachronous study looks forward from publication date to
the use a book will receive, and is therefore more reflective of the future
use of similar books. Diasynchronous studies would also be possible
which would compare two statistically related synchronous studies, but
such research has been rare. Line and Sandison warned that studies
based on the various citation sources must take into account fluctuations in coverage of the source, such as occurred with the first years of
Science Citation Index.
Other Articles
The research since these review articles has been based on three
chief sources of data: citation studies, use studies based on circulation,
and use studies based on reshelving statistics. Sandisons article on
physics journals used the same data as an earlier study by Chen.12The
raw data presented by Chen for the use of 138 physics journals at
Massachusetts Institute of Technology (MIT) showed a rapid decrease
in use as the journal aged, but she failed to allow for the relationship of
numbers of items used to numbers available for use, in this case, meters
of shelf space. This correction for density produces quite a different
picture revealing no decline in use. Of the ten most frequently used
112
LIBRARY TRENDS
0bsolescence
journals, eight conventional journals showed a peak use at twelve to
sixteen years, while two journals of advance publication peaked at six to
seven years. Further use data from the British Lending Library confirmed these findings, according to Sandison.13
In 1975, Sandison collaborated on an article with Line to point out
information needed before citation and library use studies would be of
practical help in librarie~.'~
They mentioned such things as the relative
size of journals, which they considered important enough to be made a
special project of some national library; uses per subscription cost; uses
per article; recalls per keyword; and so on. Only when citation and use
studies take these factors into account will they be of any use either to
librarians making decisions about journal subscriptions, discarding
and binding, or to information system designers selecting material to
scan and items to include in an information system.
Taylor, too, sought a practical solution, this time to weeding,
partly in response to the earlier Seymour arti~1e.l~
He discussed the
benefits and problems of a weeding program, suggesting (as mentioned
earlier) that obsolete material on the shelves can permanently discourage patrons. He compared subjective with objective criteria as the basis
for weeding decisions, and finally attempted to formulate a method for
identifying those periodical volumes which should be stored. The basis
for such a method could be reshelving data, citation data, photocopying
data, circulation data, or national loans data. The Newcastle research
revealed that a reshelving study nets only 20-25 percent of actual inhouse use; and that even with saturation propaganda concerning the
study to prevent user reshelving, i t was only possible to raise the level to
40 percent. His general formula was the 15/5 rule: a journal is a
candidate for storage if none of the last fifteen years of the journal has
circulated during the last five years. He excluded recent subscriptions
with fewer than five volumes received, and altered the rule somewhat for
titles in the humanities and discontinued titles. Nevertheless, this rule
should be of help to those libraries which circulate periodicals. It is
expressed in a fashion different enough so that it does not oversimplify
the complexity of obsolescence, although it offers some aid to weeders.
Bulick and his associates, in what was termed a historical
approach, used preliminary data from the University of Pittsburgh
study to analyze the use of materials acquired in 1969.16They found that
first-time use was greatest in the year of acquisition (1969),consistently
falling off after that until 1974, the last year for which data were
presented. By 1974,56 percent of the acquisitions had been used at least
once. There was a similar dropin number of times circulated, so that the
largest percentage of items (about 14 percent) circulated once each, and
SUMMER
1981
113
KAYE GAPEN
&
SIGRID MILNER
LIBRARY TRENDS
0bsolescence
to compare circulation data with a random shelf-list sample and a desk
sample of those books left unshelved.
Gosnells 1944 article was reprinted in summer 1978, with an
editors note which observed that earlier studies on obsolescencehad not
been followed up. The editor stated that at the time he knew of no
library which continuously derived, reviewed and incorporated obsolescence data;22and we know of no such library at this time. Gosnell based
his study on the analysis of three book lists recommended for college
library acquisitions. He was able to demonstrate that newer and more
recent books were preferred by the makers of these lists, and postulated
the existence of an average book mortality which could be applied to
all books in general, as life insurance mortality tables apply to all
members of the population. He found that various subjects in the three
lists had an obsolescence rate of from 1.5 to 31.3, with the overall
averages being 8.1, 8.4 and 9.6. Gosnell then analyzed the holdings of
five college libraries and found generally lower obsolescencerates, i.e., a
greater percentage of older titles. This was particularly true in the
classics, where two libraries had a negative obsolescencerate, signifying
a preponderance of older material. An analysis of circulation at Hamilton College showed a much lower obsolescence rate, about 4.9 overall.
Gosnell suggested that these obsolescence ratings could be used for
accreditation purposes.23They might also have significance for departmental book budgets: a field with a lower obsolescence rate might be
able to get by with a smaller budget than a more rapidly obsolescing
field, or conversely, a book purchase in a field with lower obsolescence
might be more cost-effective since it could be used for a longer period.
Bronmo put greater emphasis on the importance of literature
expansion.% He called for diachronous studies which would prove or
disprove the possibility that apparent obsolescence is merely a function
of the growth of the literature. He studied the use of books on literary
criticism at the University Library of Tromso and found that for books
published after 1945, date of publication was not a significant predictor
of use. He admitted, however, that his results would probably not apply
to other libraries, although he theorized that more significant works in
literary criticism had been published between 1950and 1954. His studies
excluded any books which he believed to be noncirculating because no
one lectured on those authors or wrote a thesis about them during the
year of his research. His conclusion was that bibliometric studies very
seldom have any immediate results.25
SUMMER
1981
115
KAYE GAPEN
&
SIGRID MILNER
Perhaps the most famous recent study of obsolescence has been the
Kent study at the University of Pittsburgh.% The purpose of the study
was to develop measures for determining the extent to which library
materials are used and what the costs are, to improve acquisitions
decisions, and to determine storage or discarding points at which alternatives to local ownership of various items became feasible. The
research was carried on over a period of seven years from 1968 to 1975
and was based chiefly on circulation statistics, in-house use sampling,
and journal use sampling at six science libraries. They found that 39.8
percent of the books acquired in 1969 did not circulate by 1975.Of those
that did circulate, 72.76 percent were borrowed during the year of
acquisition or the following year. The circulating items represented 75
percent of the titles used in-house, 99.6 percent of the outgoing interlibrary loans, and 98.1 percent of the reserve collection. They determined
that 54.2 percent of the 1969purchases should not have been made if two
uses were considered cost-effective; 62.5 percent, if three uses. Unfortunately, most libraries have not yet determined how many uses of a book
are cost-effective. The Pittsburgh reshelving study found that 24.86
percent of books used in-house had never circulated and 43 percent did
not circulate within the sample time period or within the year following
the sample period. The researchers concluded that 75-78 percent of the
in-house books did circulate externally and, therefore, that external
circulation data provided a sufficiently accurate reflection of use.
Journals at the six science libraries generally had low use, except in
the physics library, where the librarian had aggressive marketing
techniques. Interestingly, photocopying of journals increased 13 percent after the first two years following publication, and increased a
further 11 percent after fifteen years. The proposed weeding rule derived
from all these data stated that an item should not be weeded before it is
seven years old, and only items which have not circulated should be
weeded after the age of seven.
Summary
LIBRARY TRENDS
0bsolescence
a concept which cannot be usefully applied outside of the sciences.
Published articles need to be more informative about methodology, not
just giving results. In many cases, it is impossible to discover if the
reserve and reference collections are included in or excluded from the
percentages, an apparently small factor which could have a disproportionately large effect on the results. We need to consider what is meant
by use, and whether we can assign different values todifferent uses by
different populations, or whether we believe (or prefer to act as if we
believe) that all uses are equal. Should discarding be adjusted for irregularities in the curriculum, as Bronmo did when he excluded literary
criticism not circulating because no professor lectured on those authors
during that year? If no, the library may respond drastically to temporary
valuations. If yes, the library may be failing to respond quickly enough
LO shifts in research fields. Many studies have been motivated by a need
to discard something and have been interested only in what should be
discarded, not in an ideally objective research model. This paper has
already indicated the problems of differentiating between synchronous
and diachronous studies, and the greater usefulness, as well as difficulty,
of the latter, It has been assumed that circulation reflects in-house uses
as well, but that may be inaccurate. Kent stated that 75 percent of the
titles used in-house had circulated during the sample period;27 this
leaves one in four of the in-house uses not reflected in circulation.
Hindle and Buckland noted that the number of nonrecorded in-house
uses in a study at Newcastle-upon-Tyne Polytechnic Library was twenty
times the number of recorded uses.% They also found that reshelving
nets 20-25 percent of in-house use, which can be raised to 40 percent by
saturating the area with propaganda about the reshelving study. Clearly
we need an accurate way to determine in-house use before we can
conclude that i t is reflected in external circulation records. In addition,
we need research on the extent to which planned or random factors in
the library can affect obsolescence. How much can libraries affect use of
material by layout and stack arrangement, by marketing techniques,
by storage, by cancellation of journal subscriptions, or initial failure to
buy? All these areas must be far more thoroughly researched before we
can claim to understand obsolescence.
Implications
And what has all of this meant to the librarian in the field? Unfortunately, not much. Not only is the concept of the obsolescence of literature and its implications for weeding and purchasing a touchy, political
SUMMER
1981
117
KAYE GAPEN
&
SIGRID MILNER
issue, but the almost contradictory results of the research done to date
have only clouded the issue further.
First, the problems with the research completed thus far include the
failure to build upon past research in either disproving or proving older
hypotheses; there has not evolved a body of agreed-upon definitions nor
a common vocabulary; data gathering in a variety of library situations is
not done consistently; the mathematical nature of the theoretical work
is generally unclear to most practicing librarians; and because there is
no model or methodology which can be applied by librarians as part of
the ongoing library operation, obsolescence is not a topic often chosen
by librarians for consideration as a research or management activity.
Indeed, the evidence available thus far supports almost any course of
action because the research results are contradictory and ungeneralizable. As Line and Sandison point out, we have not yet even proven the
validity of the concept of obsolescence. Even if one disagrees with Line
and Sandison, every other study speaks strongly to the necessity for
investigation in each individual library to determine local and ad hoc
use peculiarities. And so librarians make decisions every day about what
to buy, what to store and what to discard, relying on their own
judgment.
Second, the significant question could be asked (and is raised by
some of those whose research is reported here) as to whether the effort
required in undertaking use studies, or in gathering other obsolescence
data,justifies the time and effort required. Not only would i t take more
time than is now invested in maintaining awareness of collection use,
but there is no guarantee that the results could be applied any more
consistently nor be more beneficial. Most librarians are not yet convinced that this is a viable or more than peripheral topic.
Third, while the theoretical and mathematical nature of obsolescence can be investigated away from the library environment, the proof
or disproof of the theorems lies within the library doors, and i t is
unfortunately often the case that the researcher and the librarian (if not
the same person) are not in sympathy with one another. We are all
familar enough with this phenomenon to know that little credence will
be ascribed to research activity when some of the people affected have
not bought into the methodology and its results. This is particularly
true for a topic such as obsolescence, in which mathematical and theoretical skills must be linked to an intimate awareness of local library
idiosyncracies, past practice and past selection practices.
A final reason why research results have had only limited application is that this area of library operations (buying, storing, discarding) is
one of the most uncertain and risky when we consider the implications
118
LIBRARY TRENDS
0bsolescence
of incorrect actions. Not only are users denied immediate access to
desired information, but it is becoming increasingly difficult to fill in
gaps in the collection because of such factors as shorter print runs, etc.
Even the studies that are successful mathematically have not been able
to arrive at an algorithm or a guideline indicating which particular
book or volume or issue is the one which will or will not be used.
Human nature usually responds to situations involving high risk and
uncertainty in as safe a manner as possible. In this instance, it means
relying on ones own judgment in assessing the political and practical
realities rather than on some researchersincomprehensible mathematical recommendations.
Todays Circumstances
119
KAYE CAPEN
&
SIGRID MILNER
120
LIBRARY TRENDS
0 bsolescence
policy committees, or in networks organized for other cooperative
endeavors. What is proposed here is a broad outline of how the model
might look and be applied. The purpose is to gather as complete and
consistent data as possible for a spectrum of libraries. In the case of
obsolescence there are two main questions which can be proposed. First,
what are the use patterns in libraries, and how can that use be ascertained? Second, what are the causal factors which interact to produce
those use patterns? In relation to the latter, we have been relying on
random influences, assuming they balance one another out, to produce
a quantitative ranking. But, as book publishers know, publicity, location, and even color of book jacket can affect use. Marketing in
libraries is another element which can affect use.
Other causal factors might include questions as to why and how
people do research. For example, concepts of the research project seem
to change during the course of research through refining and discarding
unusable topics. How would this pattern affect the use of materials in
libraries? One purpose of the model would be to distinguish true information use patterns from those information use characteristics resulting from local library policies, national policies and publisher
marketing policies.
1981
121
KAYE GAPEN
122
LIBRARY TRENDS
0bsolescence
also be evaluated financially. Finally, it would provide guidelines for
altering statistic-keeping practices in order for standard statistics to be
implemented in a library and then brought together on a more comprehensive scale.
Once the model is constructed and tested, its application would not
only become part of the librarys ongoing operation, but it would also
involve librarians and researchers in other sorts of information gathering activities as appropriate, particularly in the behavioral sciencesand
information sciences aspect of the question. Results would regularly be
analyzed within the local library context, and those results and analyses
passed on to a larger analytical body for analysis and possible further
refinement of the model. Implementation of this model would provide
not only more sophisticated management of library operations, but also
information essential to the understanding of how libraries are used and
how information was used.
Conclusion
In conclusion, while the practical results of the obsolescence
research done to date are of little value or use in daily library operations,
many of the points under consideration are vital to ensuring the viability of library operations and are worthy of new consideration. Moreover,
the critical nature of todays library world makes it imperative that
librarians attempt a new approach to the management of library operations, including the investigation of the essentials upon which library
service is based. The construction of a series of comprehensive models
which can combine research with a librarys ongoing activities will
begin to produce the information, data and quality library service
which can ensure that libraries continue to play an active role in the
information transfer process. If nothing more, the obsolescenceresearch
done to date demonstrates that research must meet reality, and it is now
encumbent upon us as librarians and researchers to ensure that that
meeting is cordial, provocatively positive, and enhancing.
References
1 . Line, Maurice B., and Sandison, Alexander. Obsolescenceand Changesin the
Use of Literature with Time. Journal of Documentation SO(Sept. 1974):283.
2. Gosnell, Charles F. Obsolescence of Books in College Libraries. College &
Research Libraries 4(March 1944):115-25.
3. Evans, Glyn. Introduction to Obsolescenceof Books in College Libraries,by
SUMMER
1981
123
KAYE GAPEN
8C
SIGRID MILNER
124
LIBRARY TRENDS
1981
125
J. TAGUE, J. BEHESHTI
8C
L. REES-POITER
A
A
1 at least routine
% at least significant
/i at least important
126
LIBRARY TRENDS
SUMMER
1981
127
J . TAGUE, J. BEHESHTI
&
L. REES-POlTER
LIBRARY TRENDS
-
-
0-
%-
_________
_____
CUMULATIVE DATA
EXPONENTIAL
LOGISTIC
LINEAR
to
0-
0-
8-
YEAR
1981
129
J. TAGUE, J. BEHESHTI
&
L. RED-POTI'ER
I(t) and R(t) represent, respectively, the number of susceptibles, infectives, and removals at a point in time t, then the change in these
functions can be described by a set of differential equations and a
threshold level determined for the number of susceptibles required to
produce an epidemic. The constants in these equations represent the
rate of infection, the rates at which susceptibles and infectives are
removed, and the rates at which new supplies of infectives and susceptibles enter the population. The model has been applied to the research
literature of mast cells;" shistosomiasis, 1862-1962;'' symbolic logic,
1847-1962;13and polywater, 1962-74.14The curves for the first two literatures display the usual exponential pattern; symbolic logic literature is
cyclic, with peaks in 1907,1932 and 1957;and polywater literature hasa
single peak in 1970.
The epidemic model is difficult to evaluate because of the indefiniteness in its presentation and applications. In no case are all three
functions S(t), I(t) and R(t) stated explicitly as functions of time,
although an exponential form is suggested for I(t). Also, the constants
required in the differential equations are not all estimated from the
empirical data. The impression is that any kind of cyclic or exponential
growth pattern is compatible with the epidemic model.
One general problem in describing the literature growth of a subfield is that it is difficult to determine when the subfield first arosefrom
its originating field. As Menard has pointed out, indexes and abstract
journals do not ordinarily create new classes or subheadings until after
the first 100 or so papers have appeared. Eventually, if the subfield
becomes very large, it will split into two or more subfields. Increasing
specialization is the response of scientists to an increasing literature
burden. However, recent investigations by Small indicate it may be
possible to identify specialties by means of cocitation-based content
analysis. 15
The Evidence
What is the evidence for exponential growth? The answer depends
on what one is counting and when.
Knowledge growth may mean literature growth-increase in the
number of publications in a field-or information growth-increase in
the number of ideas in the field. As Gilbert" has pointed out in connection with indicators of scientific growth, the use of the former as a
measure of the latter assumes, first, that all knowledge is contained in
the published literature, and second, that every paper containsan equal
amount of knowledge.
130
LIBRARY TRENDS
1981
131
J.
TACUE, J.
BEHESHTI
& L. REES-POTI'ER
curves, but the constant factor (a in equation 1)will change. For example, applying May's method to the annual noncumulated output for
Chemical Abstracts 1907-79, one obtains the exponential curve:
0.04qt-1906)
f(t) = 12,061 e
If this function is integrated from -00 to 1907, the estimated cumulated
number of chemical publicationsprior to 1907,i.e., 262,196, is obtained.
This number i s then added to the cumulated number of publications
since that time, as determined from Chemical Abstracts counts, to
obtain the data points in figure 1. The three theoretical curves are the
least-squares exponential, linear and logistic fits to these points. The
corresponding functions and multiple squared correlation coefficients
arc given in table 1. The squared correlation coefficient represents the
proportion of the variation of cumulated size values which can be
explained by the theoretical function. The algorithm developed by
O l i ~ e r was
' ~ used in an attempt to find a least--squaresfit to the logistic
curve, but unfortunately did not converge. The function given is thus
only an approximation to the least-squares solution.
TABLE 1
FUNCTIONS
APPROXIMATING
THE CUMULATIVE
NUMBER
OF
CHEMICAL
ABSTRACTS,
1907-79
Function
TYPe
Linear
Exponential
Logistic
F(t) =
F(t) =
F(t) =
-999,000+88,013(t-1906)
282,546.94emmz-1m)
44,751,400
R2
0.811
0.995
0.986
+ 170.743e-.Mwt-1m)
LIBRARY TRENDS
5)
I
Ln
n
SCIENCE ABSTRACTS
BIOLOGICAL ABSTRACTS
CHEMICAL ABSTRACTS
1960
1963
1966
1969
1971
197t
1977
1980
YEAR
133
J. TAGUE, J. BEHESHTI
&
L. REES-POTTER
I P S ABSTRACTS
H I S T O R I C A L ABSTRACTS
SOCIOLOGICAL ABSTRACTS
.,a,,eeeee.
1960
1963
1966
1969
1971
197+
1977
1980
YEAR
LIBRARY TRENDS
0
0
4h
0
0
d
ln
Y
E
0
0
fa0
cc
B0
v)
rn
c
0
el0
Oae
0
0
OBO
.
I
0
00
@e0*
YEAR
SUMMER
1981
135
0
0
0
Q
0
0"
0
0
L
0
YEAR
TABLE 2
ANNUAL
GROWTH
RATE PERCENTAGE^ FOR ABSTRACTS
IN
EIGHTABSTRACTING JOURNALS, 1960-79
Abstract Journal
Science Abstracts
Biological Abstracts
Chemtcal Abstracts
Psy c hologica 1 Abstracts
Library and Information
Science Abstracts
International Political
Science Abstracts
Historrcal Abstracts
Sociologacal Abstracts
1960-79 Noncumulated
9.0
1970-79
Cumulated
6.2
7.3
19.0
15.4
16.6
17.8
2.0
1 .O
4.8
3.5
10.1
10.1
10.2
18.3
6.4
13.2
8.8
9.3
16.6
16.7
9.8
14.4
13.9
13.4
6.7
19.0
3.3
3.3
11.4
8.0
9.7
~~~
136
LIBRARY TRENDS
TABLE 3
SQUARED
MULTIPLE
CORRELATION
COEFFICENTS
FOR LINEAR
AND
EXPONENTIAL
FITSTO CUMULATED
NUMBERS
OF ABSTRACTS,
1960-79
Abstract Journal
Science Abstracts
Biological Abstracts
Chemical Abstracts
Psychological Abstracts
Library and Information
Science Abstracts
Internationa 1 Politica 1
Science Abstracts
Historica 1 Abstracts
Socio logica 1 Abstracts
Linear Fit
Exponential Fit
0.959
0.995
0.977
0.977
0.937
0.883
0.91 1
0.925
0.930
0.960
0.923
0.919
0.987
0.954
0.940
0.879
TABLE 4
SQUARED
MULTIPLE
CORRELATION
COEFFICIENTS
FOR LINEAR
AND
EXPONENTIAL
FITSTO NONCUMULATED
NUMBERS
OF ABSTRACTS,
1970-79
Abstract Journal
Science Abstracts
Biologica 1 Abstracts
Chemical Abstracts
Psychologica 1 Abstracts
Library and Information
Science Abstracts
International Pol itica 1
Science Abstracts
Historica 1 Abstracts
Sociological Abstracts
Linear Fit
Exponential Fit
0.913
0.833
0.984
0.922
0.910
0.770
0.982
0.864
0.901
0.898
0.821
0.759
0.884
0.853
0.880
0.784
1981
157
J. TAGUE, J. BEHESHTI
8C
L. REES-POTTER
138
LIBRARY TRENDS
LL
w
w
l-l
I-
<
1981
139
TABLE 5
MAYSCATEGORIZATION
OF THE LITERATURE
OF DETERMINANTS
TO 1920
Category
New ideas and results
Applications
Systematization and history
Texts and education
Duplications
Trivia
Number
01 Papers
235
208
199
266
350
737
Percentage
12
10
10
13
18
37
are closely correlated with new results, with some time lag. Pronounced
peaks are observed in texts, publications and trivia. May describes the
pattern as follows: First the basic theory is worked out in close relation
to applications. Its successes lead to many textbooks and then to a rush
into the field of workers who inevitably lower over-all quality.26
Surprisingly, considering its importance to bibliometric
approaches to the growth of knowledge, Mays study has not been
duplicated in other subfields. Of course, such analyses are very timeconsuming and require expert knowledge. A criticism can be made that
the assignment to categories is very subjective. Also, such a categorization fails to recognize that some duplication is necessary to ensure that
new results reach a variety of audiences. However, in general, such
analyses can be very revealing.
To investigate the viability of Mays approach in another subfield
and to familiarize ourselves with its problems, we applied a similar
analysis to studies of obsolescence of library materials. The corpus of
papers was obtained by checking the heading Obsolescence of books,
periodicals, etc. in Library Literature from its first appearance in 1970
and then extending the set to include appropriate references contained
in the initial articles. The survey was restricted to English-language
items.
Because of the small number of papers, forty-six in all, they were
divided into four (rather than six) categories: (1) new ideas and results;
(2)new applications; (3)reviews and historical surveys; and (4) popularizations, duplications, trivia. Initially, each paper was categorized by
two of the writers independently. Disagreements were then resolved by
discussion and more precise definition of the categories. The publication dates ranged from 1944 to 1980. The numbers and percentages for
each category are given in table 6. Although not nearly so comprehen140
LIBRARY TRENDS
TABLE 6
LITERATURE
OF OBSOLESCENCE,
1944-80
Number of Papers
Percentage
Number of Authors
13
11
3
28
Other
19
41
11
11
3
16
Category
24
1981
141
J. TAGUE, J. BEHESHTI
L. REES-POTTER
LL
&
1959
1st
1969
1975
1980
YEAR
Fig. 7. Numbers of innovative papers and total papers published on obsolescence, 1944-80.
TABLE 7
CITATIONS
PER ARTICLE
FOR PAPERS
ON
Article
Category
Reviews
Other
142
No, Papers
13
11
3
19
OBSOLESCENCE,
1944-77
Awrage
Minimum
Maximum
No. Citations No. Citations N o . Citations
12
7
6
4
1
0
4
28
14
8
23
LIBRARY TRENDS
Forecasts
In 1963, Price said: There is a possibility the exponential law is
breaking down.32Exponential growth cannot go on forever. Recent
figures seem to indicate that this change is indeed occurring. Price
predicts that, when limits to growth are imposed on such a process,
there will be various reactions: escalation of a new process, loss of
definition of the old process, divergent (i.e., widely fluctuating) oscillations, or oscillations converging to the limit. Like Moravcsik, he feels
changing communication patterns among scientists, brought about by
new technology, will lead to a situation in which publications are of
secondary value in communicating innovations-for popularization
rather than research needs.
Rescher believes that this quality drag principle-i.e., that exponential increase in the total number of papers is needed to produce a
linear increase in the number of first-rate papers-means that, eventually, the pace of innovation (i.e., first-rate findings) will begin to
decline.% He regards the exponential increase in publication not as
useless verbiage but as the useful and necessary inputs needed for
genuine advances. However, in an age of dwindling resources, the world
can no longer afford exponential input. Thus, growth in number of
publications will become linear-perhaps has already become linear in
the seventies. The growth in cumulative number of first-rate publicaSUMMER
1981
143
J . TACUE, J. BEHESHTI
& L.
REES-POTTER
References
1. A. Conan Doyle. Quoted in Nicholas Rescher. Scientific Progress. Pittsburgh:
University of Pittsburgh Press, 1978, p. 54.
2. Popper, Karl. Objective Knowledge; An Evolutionary Approach. Oxford:
Clarendon Press, 1972, p. 144.
3 . Rescher, Scientific Progress, p. 48.
4. Price, Derek de Solla. Little Science, Big Science. New York: Columbia Univer. Science Since Babylon. New Haven, Conn.: Yale
sity Press, 1963; and
University Press, 1961.
5 . Crane, Diana. Invisible Colleges. Chicago: University of Chicago Press, 1972.
6. Lawson, J., et al. A Bibliometric Study on a New Subject Field; Energy Analy.sis. Scientometrics 2( 1980):227-37.
7. Frame, J, Davidson, et al. An Information Approach to Examining Developments in an Energy Technology: Coal Gasification. Journal of the ASZS 30(July 1979):
193-201.
8. Crane, Zmrisible Colleges; Sullivan, Daniel. et al. The State of Science: Indicators in the Specialtyof Weak Interactions.Social StudiesojScience7(May 1977):167-200;
and Menard, Henry W.Science: Growth and Change. Cambridge, Mass.: Harvard University Press, 1971.
144
LIBRARY TRENDS
SUMMER
1981
145
J. TAGUE, J. BEHESHTI
8C
L. REES-POTTER
Appendix
The counts upon which the figures are based are as follows:
Figure 1
146
Year
Chemical Abstracts
Year
Chemical Abstracts
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
11,847
15,169
15,459
17,545
21,682
23,194
26,630
25,115
18,981
16,108
15,945
13,881
15,240
19,326
20,451
24,098
25,315
26,643
27,097
30,238
33,491
39,135
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
I964
1965
75,091
80,615
86,322
92,396
102,525
118,930
127,196
134,255
146,893
169,351
171,404
189,993
197,083
48,293
1966
220 3
0
3
55,146
52,728
59,461
66,153
61,570
63,413
64,572
64,735
66,928
67,108
53,680
50,494
45,646
43,669
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
242,527
232,508
252,320
276,674
308,976
334,426
321,005
333,642
392,234
390,905
410,137
428,342
436,887
43,700
33,672
39,578
39,288
43,996
53,441
59,098
63,033
70,147
LIBRARY TRENDS
Figure 2
Year
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
. Science Abstracts
21,410
21,160
24,240
26,000
31,OOO
34,000
38,000
40,790
50,480
49,610
79,830
84,340
85,180
81,350
83,370
87,630
74,180
91,670
96,580
101,240
N u m b e r of Abstracts
Biological Abstracts Chemical Abstracts
72,530
87,000
100,790
75,710
107,100
110,120
120,100
125,030
130,020
135,010
140,030
140,020
140,000
140,040
140,020
140,020
142,510
145,010
149,010
154,990
134,255
146,893
169,351
171,404
189,993
197,083
220,303
242,527
232,508
252,320
T76,674
308,976
334,426
321,005
333,642
392,234
390,905
410,137
428,342
436,887
Figure 3
Year
Historical
Abstracts
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
2,925
2,776
3,096
3,926
3,623
3,363
3,5 16
3,527
3,417
4,180
4,015
6,406
6,359
7,607
7,244
8,779
9,094
15,414
15,675
15,692
N u m b e r of Abstracts
International Political
Science Abstracts
1,461,000
1,510,000
1,415,000
1,355,000
1,467,000
1,471,000
1,492,000
1,574,000
1,450,000
1,693,000
2,206,000
2,244,000
2,998,000
4,555,000
4,955,000
5,015,000
5,039,000
5,040,000
5,075,000
5,105,000
Sociological
A bstructs
1,905
2,322
2,952
3,810
6,062
4,262
5,130
5,434
5,969
6,019
6,000
6,981
7,190
6,689
6,982
7,687
7,289
8,267
8,339
0
J . TACUE, J. BEHESHTI
&
L. REES-POTTER
Figures 4 and 5
Year
N u m b e r of Abstracts
Library and Information
Psychological
Science Abstracts
Abstracts
1,003
968
986
1,052
1,054
1,104
1,106
1,053
1,226
2,567
2,858
2,619
3,177
3,037
3,837
3,870
3,781
4,721
4,886
4,217
1960
1961
1962
1963
I964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
8,532
7,353
7,700
8,381
10,500
16,619
13,622
17,202
19,586
18,068
21,722
23,000
17,976
24,409
25,558
25,542
24,687
27,004
26,292
29,714
Figure 6
148
Year
N o . of First Editions
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
79,289
78,875
87,604
95,433
97,469
103.679
112,300
110,715
LIBRARY TRENDS
Year
1944
1959
1960
1961
1963
1965
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
SUMMER
1981
Number of
Innovative Papers
Total
1
149
Teaching Bibliometrics
ALVIN M. SCHRADER
BIBLIOMETRICS,
THE SCIENTIFIC STUDY of recorded discourse, offers much
promise for enhancing university curricula in the informational
domain. This promise involves two dimensions of empirical knowledge, a theoretical dimension and a practical dimension, and so ought
to interest not only researchers and educators but professional practitioners as well. This promise issues from the special nature of empirical
knowledge, by which ideas about the world can be related to practical
activity. The special nature of such knowledge is derived from what
might be called a metatheory about the logic of inquiry. This metatheory is outlined below.
Bibliometrics taken as theoretical knowledge is the quantitative
characterization of the properties of recorded discourse. Quantitative
characterization is the setting forth of probabilistically true ideas about
selected phenomena. These ideas express patterns, tendencies and regularities that are said to be inherent in the phenomena. Such ideas,
because they describe general qualities, form empirical theory or just
theory. Maccia (now Steiner) and Maccia put it this way: Understanding should lead to explanation, because understanding provides
relationships or regularities which make sense of our happenings. To
explain is to appeal to regularities, i.e., to appeal to theory.2 Thus, the
objective of bibliometrics as a scientific study is to produce ideas-that
is, theory-about recorded discourse and its various important
properties.
Alvin M. Schrader is a doctoral candidate, School of Library and Information Science,
Indiana University, Bloomington.
SUMMER
1981
151
ALVIN SCHRADER
LIBRARY TRENDS
1981
153
ALVIN SCHRADER
These illustrations of impediments to the introduction of bibliometrics into graduate library school curricula can be placed in the larger
perspective of major weaknesses in the knowledge baseof educators and
researchers. The major weaknesses are seen to be their atheoretical
approach to problem-solving and their elementary descriptive
approach to quantification.
The atheoretical approach to problem-solving is illustrated pointedly by the semantic confusion in the literature between theory and
philosophy, in that pleas for a philosophy of library scienceare taken to
be pleas for theory, and the terms are used interchangeably. Philosophy,
however, is value theory and is sorted out in logic and epistemology
from empirical theory, so that ideas about what ought to be and what
ought to be done are differentiated from ideas about what exists in the
world. Value theory is not a substitute for empirical theory, but rather,
as has been demonstrated already, is a necessary complement in development inquiry which links theory to practice. In any event, pleas for a
philosophy of library science have usually boiled down to weak
attempts to rationalize the genteel empiricism in which educators and
researchers have functioned since the 1870s.
A second major weakness concerns educators and researchers traditionally elementary approach to quantification. The charge is frequently made that librarians are hostile to numeracy and quantitative
research, but this charge seems inadequate as a description of practitioners attitudes toward quantitative expression. In fact, numbers as quantifiers of library activity and library services are not merely
simple-mindedly avoided or despised, but on the contrary are universally employed to describe such variables as library holdings, book
circulation and salaries. The problem is not professional hostility, fear,
anxiety, or other psychoanalytic peculiarities brought by students to
graduate library schools. The problem is that educators and researchers
have left the professional community innumerate and deficient in dealing adequately with quantification. How can graduates go beyond
elementary description of data if they have not been educated todo so?
How are they to learn that mere data collection is not the complete act of
research if their educators teach that i t is? How are they to come to an
understanding of what Cole and Eales meant in 1917 by a statistical
analysis of a literature? Or what Hulme meant in 1923 by statistical
bibliography of scientific literature for documenting the history of
science? Or what Lotka meant in 1926 by the logarithmic frequency
distribution of scientists productivity to the progress of science as
indicated by publications? Or what Bradford meant in 1934 by the
154
LIBRARY TRENDS
A rationale for moving bibliometrics into the mainstream of graduate library school curricula has been set forth based on the logic of
inquiry. Indeed, bibliometric knowledge ought to be integrated into
existing courses and, at the same time, specialized programs ought to be
offered at both the MLS and Ph.D. levels for advanced study of both
theory and methodology. There is a growing body of researchers and
educators who are utilizing and extending bibliometrics, and some
scholarly community will no doubt lay claim to this domain in the near
future. If that scholarly community is not the library schools as presently constituted, then there are other plausible claimants, including
(but not limited to) academic programs of information science, sociology of knowledge, computer science, public policy, education, and
history and philosophy of science. Indeed, the pioneering advances in
SUMMER
1981
155
ALVIN XHRADER
relevant theory have so far come from scholars outside the library
schools, scholars such as Merton in the sociology of science, Kuhn in the
history of science, and Price in the history of science and medicine.
If none of the foregoing arguments for teaching bibliometrics has
been convincing, the only remaining appeal is to an observation attributed by Pritchard to Fairthorne: Numerical data may or may not be
dull, but they are the only alternative to thumping the tableandaffirming ones intuitions.15
Proposal for an MLS Course in Bibliometrics
LIBRARY TRENDS
UMMER
1981
157
ALVIN SCHRADER
References
I. See Steiner, Elizabeth D. Logical and Concefitual Analytic Techniques for
Educational Researchers. Washington. D.C.: University Press of America, 1978; and
. Notes o n Methodology of Educational Theory Construction. Bloomington: Indiana University, 1981. Mimeographed.
2. Maccia, Elizabeth S., and Marcia, George S . Use of SIGGS Theory Model to
Characterize Educational Systems asSocial Systerns.In Man in Systems, edited by Milton
Rubin, p. 170. New York: Gordon and Breach, 1971.
3. Wert, Lucille M., ed. Directory Issue--1980. Journal of Education for
Librarianship, vol. 20, 1980.
4. Busha, Charles H., and Harter, Stephen P. Research Methods in Librarianship;
Techniques and Interpretation. New York: Academic Press, 1980.
5. Narin, Francis, and Moll, Joy K. Bibliometrics. Annual Review of Inforrnation Science and Technology 12(1977):35-38.
158
LIBRARY TRENDS
Teaching Bibliometrics
6. Broadus, Robert N. The Applications of Citation Analyses to Library Collection Building. Advances in Librarianship 7( 1977):299-335.
7. Pritchard, Alan. Statistical Bibliography or Bibliometrics? Journal of
Documentation 25(Dec. 1969):348-49.
8. Ferrante, Barbara K. Bibliomeuics: Access in the Library Literature. Collection Management 2(Fall 1978):199.
9. Cole, F.J., and Eales, N.B. The History of Comparative Anatomy. Part 1: A
Statistical Analysis of the Literature. Science Progress 1 l(Apri1 1917):578-96.
10. Hulme. Edward W. Statistical Bibliography in Relation to the Growth of
Modern Civilization. London: Grafton, 1923.
11. Lotka, Alfred J. The Frequency Distribution of Scientific Productivity.
Journal of the Washington Academy of Sciences 16(June 1926):317-23.
12. Bradford, Samuel C. Sources of Information on SpecificSubjects.Engineering
137(26 Jan. 1934):85-86.
13. Gosnell, Charles F. Obsolescence of Books in College Libraries. College LY
Research Libraries 4(March 1944):115-25.
14. Thomas, Lewis. The Medusa and the Snail; More Notes of a Biology Watcher.
New York: Bantam, 1974, p. 133.
15. Pritchard, Alan. Statistical Bibliography: An Interim Bibliography. London:
horth-Western Polytechnic School of Librarianship, 1969, p. 1.
16. Aiyepeku, Wilson 0. Bibliometrics in Information Science Curricula. The
Information Scientist 9(March 1975):29-34.
17. Siege], Sidney. Noncarametric Statistics for the Behavioral Sciences. New York:
MrGraw-Hill, 1956.
18. Pratt, Allan D. The Analysis of Library Statistics. Library Quarterly
45(1975):275-86.
19. Many of the suggestedprojectsare from the list of assignments for Dr.L. Housers
bibliometrics course, University of Toronto, spring 1981.
20. Sweaney, Wilma P. An Empirical Test of the Incompatibility of the Two
Formulations of Bradfords Law. MLS research report, Faculty of Library Science,
University of Toronto, 1978.
21. Lotka, Frequency Disuibution.
22. Pratt, Allan D. A Measure of Class Concentration in Bibliometrics. Journal of
the ASIS 28(Sept. 1977):285-92.
23. Voos, Henry G. Bibliometrics and Management of Libraries. Proceedings of
the ASIS Annual Meeting 14(1977):fiche9-E4-9-E6.
24. Pritchard, Slatistica 1 Bibliography; and
. Announcement in Radials
Bulletin, no. 2 (1979), p. 149.
25. Hjerppe, Roland. A Bibliography of Bibliometrics and Citation Indexing and
Analysis. Stockholm: Royal Institute of Technology Library, Dec. 1980.
26. The author wishes to thank Prof. L. Houser of the University of Toronto and
Prof. A. Pratt of the University of Arizona, Tucson (formerly of Indiana University) for
stimulating and supporting my intrrest in bibliometrics.
SUMMER
1981
159
ALVIN SCHRADER
Appendix
BIBLIOMETRICS COURSE SYLLABUSX
1. Overview of the Field (1 unit)
This unit focuses on terminology, major concepts and reviews of the
literature.
Uncertainty about a variety of variables and their interconnections with
respect to scientific literatures was the impetus for bibliometric study. Some of
the initial questions were: Does the literature of a field represent the field? How
does the growth of a literature relate to the growth of scientific knowledge? What
are the essential characteristics constituting the structure of a literature? How d o
various literatures compare with respect to structure?Whoare the producers of a
literature? Who are its users? How are quantityand qualityof literature production related? These and later, more complex questions have attracted the attention of increasing numbers of researchers and theoreticians in a wide spectrum
of academic disciplines. Among current difficult problems are: the functions of
referencing (intellectual property recognition, persuasion or window dressing);
the relationship between the cognitive structure of a discipline and its social
structure, particularly as manifested in communication and publishing patterns; and the theoretical validity of bibliometrics i n scholarly nonscientific
fields.
The rapidly advancing status of bibliometrics as a scholarly specialty is
indicated by its large body of literature, now well over 2000 publications, by the
recent appearance of at least three journals, and by the attendant review literature. Particularly exciting is the international makeup of the research front,
comprising social scientists not only in the United States but also Russia,
Europe and England. Although bibliometric study began with the literatures of
the natural and biological sciences, social science literatures have also been
examined bibliomeuically from time to time. In addition, there have been a
handful of attempts to apply the various techniques to some of the literaturesof
the humanities disciplines.
Although there does not appear to be a consensus in the literature on the use
of the term bibliornetrics, the various other descriptions represent subspecialty
thrusts. Recently, for example, Narin (1976)introduced the concept of evaluative bibliometrics, which he defined as the quantitative measurement of the
properties of a literature in order to evaluate scholarly activity in a field. I n
addition, there is the term scientometrics, the scientific analysis of science and
science policy. The latter focus was embodied in the formation i n late 1978 of
Scientometrics; An International Journal for all Quantitative Aspects of the
Science ofScience and Science Policy. This is the second of three recent, relevant
journals. T h e first was Social Studies of Science; An International Reuiew of
Research in the Social Dimensions of Science and Technology (earlier entitled
Science Studies, from its inception in 1971 until the end of 1974). The third
journal, although of very recent origin, shows promising relevance. It isentitled
*A reference to an author during discussion of a unit has been footnoted only if the
reference does not appear in the accompanying list of readings.
160
LIBRARY TRENDS
Teaching Bibliometrics
Knowledge: Creatton, Diffusion, Utilization,and is aimed at bringing together
researchers, policy-makers, research and development managers, and other
practitioners engaged in the process of knowledge development. Of course,
there are also a number of journals relevant to bibliometrics within the history
and philosophy of science in terms of theoretical implications, notably the
British Journal for the History of Science. Another important indicator of
bibliometrir advance was the inauguration in 1975 of the Society for Social
Studies of Science, colloquially known as the 4S, which was reported to have
attracted over 500 members by the end of its first year.
A comprehensive review of the literature of bibliometrics was published by
Narin and Moll (1977),and a survey of developments to date by Hjerppe. In
addition, more than thirty doctoral dissertations and several monographs on
various aspects of bibliometrics have been published; among the notable monographs are those by Price (1963, 1975), Narin (1976), Elkana (1978), Garfield
(1979),and Garvey (1979).(Twoothermonographs haveattempted to presentan
integrative overview of bibliometrics, Donohue and Nicholasand Ritchie? but
neither has proven ~atisfactory.~
The definitive text awaits an author.)
Narin (1976) has mapped out three research fronts in the literature of
bibliometrics (see table 1 ) . They are: ( 1 ) the size of the scholarly enterprise; (2) the
properties (i.e., structure) of the literature of eachenterprise; and (3)the productivity of scholarly authors.
Size of scholarly enterprise is generally expressed in terms of national or
international comparisons among literatures. Recently, attempts have been
made to correlate scientific productivity of a given country as indicated by its
scientific literature with national economic-vitality. Such an index may become
particularly meaningful to the evaluation of progress in underdeveloped and
middle-power nations.
The structure of a literature is generally expressed in terms of relationships
among individual publications or among a set of publications such as journal
literature, in terms of links between researchers,or in termsof mapsofdisciplinary phenomena. These relationships and links and maps can be used toidentify
key events, advances and patterns of scholarly research. Newer work such as
cocitation analysis and multidimensional scaling can be used for evaluative
functions as well as description, in comparing productivity among authors,
journals or organizational entities such as funding agencies, university departments, professional associations, or countries. Suggested readings for this unit
follow.
Terminology:
Ferrante, Barbara K. Bibliometrics: Access in the Library Literature. Collection Management 2(Fall 1978):lW-204.
Garfield, Eugene. Scientometrics Comes of Age. Current Contents: Life
Sciences 1( 12 Nov. 1979):5-10.
Pritchard, Alan. Statistical Bibliography or Bibliometrics? Journal of Documentation 25(Dec. 1969):348-49.
Wittig, Glenn R. Statistical Bibliography-A Historical Footnote. Iournal of
Documentation 3(Sept. 1978):240-41.
SUMMER
1981
161
ALVIN SCHRADER
TABLE 1
CHRONOLOGY
OF MAJOR CONTRIBUTORS
TO THE DEVELOPMENT
OF
BIBLIOMETRIC
ANALYSES
OF SCIENTIFIC LITERATURES
Size of the
Literature
Structure of the
Literature
Productivity
1910
Cole and Eales
1920
Hulme
Lotka
1930
Bradford
Wilson and Fred
1940
1950
Gosnell
(Bradford)
Fussler
Daniel and Louttit
1960
Kessler
Price
Bourne
Gottschalk and Desmond
Xhighnesse and Osgood
Barr
Price
Narin and Carpenter
~~~
(Zipf)
Lehman
Garfield
Schocklev
Westbrook
Price
162
LIBRARY TRENDS
Teaching Bibliometrics
Holzner, Burkhart, and Marx, John H. Knowledge Application; The Knowledge System in Society. Boston: Allyn and Bacon, 1979.
Merton, Robert K. The Sociology of Science; Theoretical and Empirical Inuestigations. Chicago: University of Chicago Press, 1978.
Price, Derek de Solla. Little Science, Big Science. New York Columbia University Press, 1963.
. Science Since Babylon. 2d ed. New Haven, Conn.: Yale University Press, 1975.
2. Theoretical Framework (2 units)
These units focus primarily on exogenous theory from the sociology of
science and from the history and philosophy of science. Recently, some promising indigenous contributions from information science have been published.
One of these is Pritchard (1972), who attempted to relate bibliometrics to the
information transfer process, conceptualizing the flow of information through
channels as analogous to a chemical or industrial process. Another is Meincke
and Atherton (1976),who have introduced the difficult but interesting concept
of knowledge space or scientific space, in which concepts, fields of knowledge,
and information items in a retrieval system are likened to physical objects (such
as atoms) that occupy multidimensional vector space.
However, while theoretical advances in the sociology of science have been
spectacular, little progress has occurred in our understanding of the nature of
theoretical properties of the vast array of subject literatures. Forexample, P e r i d
has argued, convincingly, that citation analysis cannot properly be applied to
historical research because citations representing the source documents for
history cannot be sorted out from citations representing ordinar references.
This may well have been the difficulty in the analysis by Bracel of citation
patterns in graduate library school doctoral dissertations, a large proportion of
which have always been historical research. The same validity problem arises
with respect to citation analysis of literary criticism studies.
Theoretical uncertainty goes deeper than this, however, for what we really
need to understand better is under what conditions a literature structure maybe
said to be isomorphic to the referencing behavior and norms of its producers.
Scientific literature is assumed to be isomorphic, or more nearly isomorphic, to
the referencing behavior of scientific authors because scientists produce knowledge by building on previous knowledge, and so they acknowledge the antecedent work, the intellectual property, of their colleagues. Thus, both the scientific
advances and the citing may be regarded as cumulative. Garfield, Malin and
Small (1978) suggest that citation linkages in science reflect both the cognitive
structure and the social structure of a specialty; thisargument has not yet been
adequately elaborated for empirical testing, however.
Like this theoretical hypothesis, there are many other challenges awaiting
bibliometric inquiry. Some of these are to produce adequate explanations of the
followingproblems and phenomena: how progress in scientific knowledge can
be objectively identified, and how such progress is reflected in the literature;
how the social systems of science and nonscientific scholarship differ, and how
they reflect differing communication patterns, differing referencing practices
and norms, and differing publication practices; how patterns of information
SUMMER
1981
163
ALVIN X H R A D E R
164
LIBRARY TRENDS
1981
165
ALVIN SCHRADER
166
LIBRARY TRENDS
Teaching Bibliometrics
Sweaney, Wilma P. An Empirical Test of the Incompatibility of the T w o Formulations of Bradfords Law (MLS research report, Faculty of Library
Science). Toronto: University of Toronto, 1978.
Vickery, B.C. Bradfords Law of Scattering. Journal of Documentation
4( 1948):198.
Wilkinson, E.A. The Ambiguity of Bradfords Law. Journal of Documentation 28(June 1972):122-30, 232 (erratum).
Lotka:
Allison, Paul D., et al. Lotkas Law: A Problem in Its Interpretation and Application. Social Studies of Science 6(1976):269-76.
Coile, Russell C. Lotkas Frequency Distribution of Scientific Productivity.
Journal of the ASIS 28(Nov. 1977):366-70.
Lotka, Alfred J. The Frequency Distributon of Scientific Productivity.
Journal of the Washington Academy of Sciences 16(19 June 1926):317-23.
Vlachjr, Jan. Frequency Distributions of Scientific Performance; A Bibliography of Lotkas Law and Related Phenomena. Scientornetrics
1( 1978):1O9-30.
Recent advances:
Bookstein, Abraham. Explanations of the Bibliometric Laws. Collection
Management 3(Summer-Fall 1979):151-62.
Fairthorne, Robert A. Empirical Hyperbolic Distributions (Bradford-ZipfMandelbrot) for Bibliometric Description and Prediction. Journal of
Documentation 25(Dec. 1969):s19-43.
Garfield, Eugene. Bradfords Law and Related Statistical Patterns. Current
Contents: Life Sciences 2( 12 May 1980):5-12.
Pratt, Allan D. A Measure of Class Concentration in Bibliometrics. Journalof
the ASZS 28(Sept. 1977):285-92.
Price, Derek de Solla. A General Theory of Bibliometric and Other Cumulative
Advantage Processes. Journal of the ASIS 27(Sept.-Oct. 1976):292-306.
. Cumulative Advantage Urn Games Explained: A Reply to
Kantor. Journal of the ASIS 29(July 1978):204-06.
Shaw, W.M. Entropy, Information and Communication. Proceedings of the
ASZS Annual Meeting 16(1979):32-40.
4. Research Traditions: Empirical Descriptions (5 units)
SUMMER
1981
167
ALVIN SCHRADER
Scholarly norms of citing are complex and vary from field to field and from
science to nonscience. Similarities in citing conventions between scientific
literatures and humanities literatures are not adequately understood at all, but
the social conventions determining citing behavior in a given field are crucial to
theoretically valid characterizations of the structure of the fields literature.
The citing of antecedent research is a strong social norm among scientists
and social scientists. Citation relationships are conceptualized as semantic
relations between texts that constitute directed lines connecting later to earlier
work. When these relations are graphed, they are said (borrowing from graph
theory) to form a digraph. Such a digraph reflects semantic textual structures
such that anteredent subject matter is linked to later subject matter. Citation
analysis relies on the occurrence of the social norms of citing, but there are many
other reasons for particular choices of prior authors and papers. As Lipetz (1965)
and Weinstock (1974), among others, have noted, these choices could be motivated by any of the following: paying homage to pioneers; providing background reading; giving an example; modifying, correcting, criticizing, or
refuting previous work; identifying the original publication of an eponymic
concept or term such as Paretos law; or window dressing. Refinements in
citation analysis methodology are now being produced through contextual
analysis of references. Also, studies have been undertaken in science toassess the
correlation between citation data and peer judgments. Cole and Cole (1973)and
Zuckerman (1977), among others, have demonstrated that straight citation
counts are highly correlated with virtually every refined measure of research
quality and other forms of scientific recognition, such as the Nobel prize and
membership in a national academy of science.
Thus, although errors or deviations in citing behavior do occur, the
accumulation of bibliographic links over hundreds or even thousands of actsof
citing over time is seen to map out thecognitivedomain of scientific knowledge
in a given area; the self-correcting and cumulating nature of knowledge is a
probabilistic process that sloughs off the errors or deviations and dead-end
research programs. In effect, when anauthor cites he is classifying hisown work
with respect to the perceived domain of all prior scholarship.
What lends further credence to the validity of citation analysis, at least in
science, is the consensus factor; that is, the journal-refereeing system requires a
consensus among selected scholars on the worth of the work being submitted for
publication, and one of the criteria for judging such worth is coherence with
past research, presumably as represented by the researcherschoice of citations to
antecedent work. However, it should also be noted that citation anomalies
having a small effect on the average might have serious distorting effects in a
particular instance, for example, anomalies such as obliteration, eponyms and
highly unpopular claims like those of Arthur Jensen.
Thus, citing theory is in its infancy. Among the factors influencing the
nature and frequency of citation are the following: the size of the field and
number of authors in a field; the nature of the field, especially its degree of
theoretical integration or codification; whether a field is a paper- or productproducer, and especially what proportion of a field may be said to be engaged in
secret research, such as for military and industrial organizations; the age of a
field; differing growth rates of fields; journal editorial policies, such as rates of
publication, language of publication, length of articles; journal function (e.g.,
168
LIBRARY TRENDS
Teaching Bz b 1iometrics
reporting research or current awareness); journal quality and prestige; author
eminence; average number of references per journal article; the degree of anomalous citation behavior i n a field; perceived social utility of the field and funding
for research; rates of multiple versus single citation to a paper; rates of multiple
versus single authorship; variability in quality and importance of papers;
relationships between obsolescence and changes in journal size; and above all,
differentialreference functions and norms among the sciences, social sciences,
technological fields, and the nonsciences. Suggested readings for this unit
follow.
Citation analysis:
Cawkell. A.E. Understanding Science by Analysing Its Literature. The
Znformation Scientist lO(March 1976):3-10.
Cole, J.R., and Cole, S. Social Stratification in Science. Chicago: University
of Chicago Press, 1973.
Garfield, Eugene. The Obliteration Phenomenon in Science-and the
Advantage of Being Obliterated! Current Contents:Lifesciences 18(22Dec.
1975):5-7.
. Citation Analysis and the Anti-Vivisection Controversy.
Current Contents:Lije Sciences 20(25 April 1977):5-10;and Citation Analysis and the Anti-Vivisection Controversy. Part 11. An Assessment of Lester R.
Aronsons Citation Record. Current Contents: Life Sciences 20(28 Nov.
1977):5-14.
. Restating the Fundamental Assumptions of Citation Analysis.
Current Contents: Life Sciences 20(26 Sept. 1977):5-6.
. High Impact Science and the Case of Arthur Jensen. Current
Contents: LifeSciences 21(9 Oct. 1978):5-15.
. Is Citation Analysis a Legitimate Evaluation Tool? Scientometrics 1(1979):359-75.
Gilbert, G. Nigel. Referencing as Persuasion. Social Studzes of Science
7(Feb. 1977):113-22.
Griffith, Belver C., et al. On the Use of Citations in Studying Scientific
Achievements and Communication. Society for Social Studies of Science
Newsletter 2 (Summer 1977):9-13.
Kaplan, Norman. The Norms of Citation Behavior: Prolegomena to the Footnote. American Documentation 16(July 1965):179-84.
Line, Maurice B., and Sandison, Alexander. Obsolescenceand Changes in
the Use of Literature with Time. Journal of Documentation 30(Sept.
1974):283-350.
Porter, Alan L. Citation Analysis: Queries and Caveats. Social Studies of
Science 7( 1977):257-67.
Price, Derek de Solla. The Citation Cycle. In North American Networking,
(collected papers, ASIS 8th mid-year meeting, Banff, May 1979), edited by
A.B. Piternick. Washington, D.C.: ASIS, 1979.
Small, Henry G. Co-citation in the Scientific Literature: A New Measure of
the Relationship between T w o Documents. Journal of the ASZS 24( JulyAug. 1973):265-69.
. Cited Documents as Concept Symbols. SocialStudies ojSczence
B(Aug. 1978):327-40.
SUMMER
1981
169
ALVIN SCHRADER
Vms, Henry G., and Dagaev, Katherine. Are All Citations Equal? Or, Did We
There is a great deal of controversy about the appropriateness of bibliometric applications to practical problems. Some authors have argued that underlying theoretical explanations of the bibliometric distributions are too weak to
guide information facility policy decisions, that bibliometric theory is not ready
for practical application. Others have urged even greater application, particularly to library collection management. Several reviews have been published,
notably those of Broadus (1977), Buckland (1978). Fitzgibbons (1980), and
Lancaster (1977). Moll edited a special issue in 1978 of Collection Management
devoted to bibliometrics in library collectlbn management.
However, a number of major application problems have not been adequately addressed in the bibliometrics literature. First, most of the mathematical
models which have been proposed are static models, i.e., they assume fixed
economic conditions, for example, with respect to journal acquisitions costs
versus interlibrary loan costs, fixed subject areas, fixed user interests and homogeneous information demands, and fixed information facility objectives and
policies. Second, the models are simplistic and do not adequately reflect reality
in that they assume-but are unable to demonstrate operationally-that user
satisfaction can be defined and measured, and that individual user dissatisfaction is unimportant to the advance of scholarship. Third, the mathematical
170
LIBRARY TRENDS
Teaching Bibliometrics
models have weak explanatory power. They are unable, for example, to predict
the performance of new journals, new researchers and new papers. Fourth, the
variables in the models are only vaguely linked to sociological concepts. For
example, citation analysis treats the formal communication process, while use
and user studies concern demands on an information facility. Are identical or
highly dissimilar processes and modes of social communication behavior thus
being measured? How valid is the assumption that citations reflect information
facility use patterns? Fifth, almost all information facility objectives and, in
particular, collection policies are so unclearly expressed that they boil down to
assertions that cannot be operationalized and tested. Fundamental concepts
such as information need, user satisfaction, and even information facility use,
are inadequately articulated. Until information facilities begin to support
development inquiry on a grand scale, with funds for researchers rather than for
computers and computer applications, progress in applying bibliometric theory will be very slow. Finally, almost all the models and bibliometric explanations to date have been focused on scientific journal literatures, scientific
information facilities, and scientific researchers. More work is needed to determine what form practical applications should take in public and academic
libraries as they are presently constituted, with amorphous, heterogeneous user
populations exhibiting highly diversified demand patterns.
These are some of the difficult but challenging problems ahead. Suggested
readings for this unit follow.
Reviews of the literature:
Broadus, Robert N. The Applications of Citation Analyses to Library Collection Building. Aduances in Librarianship 7(1977):2!%-335.
Buckland, Michael K. Ten Years Progress in Quantitative Research on
Libraries. Socio-Economic Planning Sciences 12(1978):333-39.
Fitzgibbons, Shirley A. Citation Analysis in the Social Sciences. In Collection Development in Libraries: A Treatise, edited by George B. Miller and
Robert D. Stueart, pp. 291-344. Greenwich, Conn.: JAI Press, 1980.
Lancaster, F. Wilfrid. The Measurement and Evaluation of Library Services.
Washington, D.C.: Information Resources Press, 1977, pp. 327-67.
Moll, Joy K., ed. Special Issue on Bibliometrics. Collection Management,
vol. 2, Fall 1978.
Readings:
Allen, Edward S. Periodicals for Mathematicians. Science 70(20 Dec.
1929):592-94.
Baughman, James C. Towards a Structural Approach to Collection Development. College & Research Libraries 38(May 1977):241-48.
Bourne, C.P. Some User Requirements Stated Quantitatively in Terms of the
90 Percent Library. In Electronic Information Handling, edited by A. Kent
and O.E. Taulbee, pp. 93-110. Washington, D.C.: Spartan Books, 1965.
Drott, M. Carl, et al. Bradfords Law and Libraries: Present ApplicationsPotential Promise. Aslib Proceedings 31(June 1979): 296-304.
Garfield,Eugene. Citation Analysis as a Tool in Journal Evaluation. Science
178(NOV.1972):471-79.
SUMMER
1981
171
ALVIN SCHRADER
Notes
1. Hjerppe, Roland. An Outline of Bibliometrics and Citation Analysis.
Stockholm: Royal Institute of Technology, 1978.
2. Donohue, Joseph C. Understanding Scientific Literature: A Bibliometric
Approach. Cambridge, Mass.: M I T Press, 1973.
3. Nicholas, David, and Ritchie, Maureen. Literature and Bibliometrics. London:
Clive Bingley, 1978.
4. For reviews of Donohues monograph, see: American Libraries 5(July-Aug.
1974):368; Brookes, Bertram C. Nature 249(May 1974):496-97; Dikeman, R.K. American
Reference Books Annual 6(1975):138-39; Lancaster, F. Wilfrid. Newsletter on Library
Research, no. 11 (March 1974),pp. 7-11; Narin, Francis, and Voos, Henry. Journal of the
ASZS 26(March-April 1975):129; Rcsenberg, Betty. Znformation Storage and Retrieval
1O(Dec. 1974):420-21;Swisher, Robert. R Q 14(Fall 1974):75-76;Vaillancourt, Pauline M.
Library Journal 99(Sept. 1974):2045; and Wilkinson, Elizabeth. Journal of Documentation 30(Dec. 1974):438. For reviews of Nicholas and Ritchies monograph, see: Culnan,
Mary J. Znformation Processing and Management 15(1979):170;and Morrison, Perry D.
College C Research Libraries 39(Sept. 1978):414-15.
5. Periu, B. Cheila. Research in Library Science as Reflected in the Core
Journals of h e Profession: A Quantitative Analysis (1950-1975). Ph.D. diss., Florida
State University, 1978.
6. Brace, William. A Citation Analysis of Doctoral Dissertations in Library and
Information Science, 1961-1970. Ph.D. diss.. Case Western Reserve University, 1975.
172
LIBRARY TRENDS
V.
N.
II
II
I1
II
Editor
Darr
00.
1962
1962
Jan.
1963
April
I963
W i d r e d C. Ladlry
Haiold Lancour
J. Clemoir Harrison
Margaret Knox Goggm
July
1963
Oct
Jan.
1963
1964
Robert \'orper
April
1964
Guy Garrison
July
Oct.
Jan.
April
1964
1964
1965
196.5
H.C. Campbell
Charlrr I.. Trinknpr
Katharinr G . Harris
Eugene B. Jackson
Andrew Geddes
July
Orl.
1965
Jan.
April
1966
Government Publicationr
2 Collrnion Dwelopm~ntm Ilniverrity Librarirs
3 Bibhography: Current State and
Future Trmds. Part I
4 Bibliography: Current Stat? and
Future Trends. Part 2
T h o m a S. Shaw
Jerrold Ornr
Robert B. Downs
Francs B Jenkins
Robert 0. Downs
Franrrr 0. J m h n r
Jul,
Ort.
1966
Jan.
1967
April
1967
Esther J. P i m q
Rotxrt L. Ialmadgr
c.Walter Stonr
Foster E. Mohrhardr
I Library Boards
2 Bibliothwipy
3 l a w Libraria
4 Financial Admmirtrauon of Libraries
J . A r c k r Eggrn
Ruth M. l-ewr
Bcmita J. Dawn
Ralph H Parker
Paxton P. Price
July
~~
\'.
I2
12
12
I?
V.
13
N. I R w a r r h Methods In Librarianship
I3
13
I3
V
I Publtc~LibraryService to Chrldrrn
14
I4
I4
14
t
t
V.
N.
15
15
15
15
N.
16
19
19
1968
1968
H.C. Campbell
Mar Graham
Jam.
1969
April
1969
H Vat1 Dralr
David C. U'eber
Rolland E Stevens
July
OCI
Jan.
1969
Hmry J. D o k t e r
April
1970
July
Alex Ladrnsan
Mary B. Cassaca
0'1.
1970
1970
Jan.
1971
Philip Lewis
April
1971
Elizabmh W. Stone
July
1971
Helm H. Lynran
Orr.
Jan.
April
1971
1972
1972
Gordon Strv~nson
Felix E. Hirrrh
Elranor P h i n n q
F Willrid Lancasrrr
July
1972
1972
1975
H.R. Simon
July
1973
Alrcr L o h m
Sarah R e d
Ort.
Jan.
1973
1974
George S. Bonn
April 1974
in College Librarianship
I Inirllrrtual Freedom
2 Statc and Federal Legislation for Librarm
3 Book Storage
1 New Dimensions in Educational 'I?< hnology
lor Multi-Media Centers
In Libraries
2 Library Programs and SPw~resto thr
Didrantagrd
3 The Influence of Amrriran Librarianship Abroad
I Current Trends in llrban Main Librarirs
20
ZO
20
~
April
July
Oa.
ZO
117
1967
196H
Sara K. Srygle)
19
19
Jul,
Or[
Jan
1966
Grace T. S t o m s o n
Audrry B i d
N. I T r m L
18
18
18
C r d K. Byrd
1962)
1969
1970
~~
21
21
21
21
V.
1966
18
N. I
17
17
17
17
I'
16
16
16
IS5
~~
V.
\'.
~~
Clvde Walton
Ilannis S Smith
Frarer G. Pmlc
22
22
22
22
N.
Oct.
Jan.
April
1973