2011 - Charles H. Davis, Debora Shaw - Introduction To Information Science and Technology-Information Today, Inc.
2011 - Charles H. Davis, Debora Shaw - Introduction To Information Science and Technology-Information Today, Inc.
2011 - Charles H. Davis, Debora Shaw - Introduction To Information Science and Technology-Information Today, Inc.
Information Science
and Technology
Introduction to
Information Science
and Technology
All rights reserved. No part of this book may be reproduced in any form or by
any electronic or mechanical means, including information storage and
retrieval systems, without permission in writing from the publisher, except by
a reviewer, who may quote brief passages in a review. Published by
Information Today, Inc., 143 Old Marlton Pike, Medford, New Jersey 08055.
Publisher’s Note: The editors and publisher have taken care in preparation of
this book but make no expressed or implied warranty of any kind and assume no
responsibility for errors or omissions. No liability is assumed for incidental or
consequential damages in connection with or arising out of the use of the infor-
mation or programs contained herein.
www.infotoday.com
Contents
CHAPTER 1
Our World of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
CHAPTER 2
Foundations of Information Science and Technology . . . . . . . . . . . . 9
CHAPTER 3
Information Needs, Seeking, and Use . . . . . . . . . . . . . . . . . . . . . . . . . 27
CHAPTER 4
Representation of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
CHAPTER 5
Organization of Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
CHAPTER 6
Computers and Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
CHAPTER 7
Structured Information Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
CHAPTER 8
Information System Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
CHAPTER 9
Evaluation of Information Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 129
CHAPTER 10
Information Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
CHAPTER 11
Publication and Information Technologies . . . . . . . . . . . . . . . . . . . 157
v
vi Introduction to Information Science and Technology
CHAPTER 12
Information Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
CHAPTER 13
The Information Professions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
CHAPTER 14
Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Collaborators and Contributors
vii
viii Introduction to Information Science and Technology
Hamid Ekbia
Kathryn La Barre
Liliano Sergio Maderna (in memoriam)
Hanna M. Söderholm
Fred Sonnenwald
Chicca Stitt
Miles J. Stitt Jr.
The editors appreciate the advice, patience, and support of the many con-
tributors to Introduction to Information Science and Technology.
Introduction
A Collaborative Book
Introduction to Information Science and Technology is a collaborative book.
Volunteers with knowledge in their respective areas have provided draft
chapters reflecting their perspectives and knowledge. These were reviewed
and enhanced by the initial authors and other experts using a wiki hosted by
the American Society for Information Science and Technology. This book is a
product of that collaborative effort, edited to serve as an introductory text.
One might think that an introduction for such an important field would be
easily written. This is not the case. Each type of information system (the web,
online databases, libraries, etc.) has a largely separate literature. Attention is
typically restricted by technology, usually to computer-based information
systems, or is focused on one function, such as retrieval, disregarding the
broader context. What is published may be specialized, technical how-to
writing with localized terminology and definitions. For example, publica-
tions on theory are often narrowly focused on such topics as logic, probabil-
ity, or physical signals. This diversity has been compounded by confusion
arising from inadequate recognition that the word information is used by
various people to denote different things.
This text attempts to alleviate these problems by encouraging contribu-
tors to write at an introductory level, engaging additional readers and edi-
tors to broaden and strengthen the work, and testing a draft with students in
an Introduction to Information Science course. The wiki that supported the
collaboration retains the more detailed and specialized contributions and
has helped the editors focus this edition of the book as an introduction to
the field.
The American Society for Information Science and Technology provides
all members access to the wiki. It presents considerably more detail, opin-
ions, and interpretations than could be included in this print edition. It is our
hope that the wiki version will continue to evolve and that a second edition
of this book will benefit from the interaction the wiki supports.
xi
xii Introduction to Information Science and Technology
1
2 Introduction to Information Science and Technology
In the 1980s, Ithiel de Sola Pool and his colleagues investigated the growth
of information (measured in words) supplied by the media in the U.S. and
Japan (Neuman & Pool, 1986; Pool, 1983; Pool, Inose, Takasaki, & Hurwitz,
1984). They analyzed the number of words supplied and consumed as well as
the average price per word. They reported that available information was
shifting from print to electronic media, the price per word was falling dra-
matically, and although the rate of consumption was increasing (at 3.3 per-
cent per year), it was falling ever further behind the amount of information
supplied. These findings have implications for information overload, infor-
mation diversity, and the economics necessary to sustain vibrant, creative
industries in journalism and popular and high culture.
Neuman, Park, and Panek (2010) extended Pool’s work to cover the period
from 1960 through 2005. They found a tremendous increase in the ratio of
supply to demand. In 1960, 98 minutes of media were available for every 1
minute of media consumed: Choices had to be made, but the number of
choices was within reason. By 2005, more than 20,000 minutes of mediated
content were available for every minute consumed. This, they point out, “is
not a human-scale cognitive challenge; it is one in which humans will
inevitably turn to the increasingly intelligent digital technologies that created
the abundance in the first place for help in sorting it out—search engines,
TiVo’s recommendation systems, collaborative filters” (p. 11). Neuman and
colleagues also found a change from push to pull technologies: Traditional,
one-way broadcast and publishing media push content, with the audience
accepting the decisions of newspaper editors and network executives. Today,
technologies are evolving to pull in audience members, who have more
choice and more control than ever before over what they watch and read, and
when. Search engines (especially Google) and social networking sites (e.g.,
YouTube, Facebook) are emerging as major influences on public opinion and
popular culture.
Science, Popper says, is a process that takes place in all three worlds: In
World 1, events happen; in World 2, we try to make sense of them; and in
Our World of Information 3
World 3, we try to explain the events while others react to these explanations
and try to improve on them. Thus, we bring the three worlds together to cre-
ate information (or awareness) through a never-ending process that pro-
duces knowledge. Along the way we create tools and technologies that help
this process.
To take a less philosophical, more practical view, information reaches us
from records of historical events, scientific knowledge, religious or cultural
knowledge, art and literature, personal knowledge and records, documenta-
tion of governments or organizations, business, commerce, and many other
sources.
Information may arrive prepackaged from a variety of sources. Publishers,
government agencies, and other organizations produce formal products
such as books, journals, and databases. Individuals package information in
email, blogs, wikis, and other forms. Various institutions handle these pack-
ages. Libraries customarily deal with books, journals, video and audio
recordings, microforms, databases, and even manuscripts, papyri, and clay
tablets. Archives typically house governmental records, personal papers, and
manuscripts. Databases (some commercially compiled and others available
for free on the internet) also provide access to books, journals, webpages,
blogs, videos, and other sources.
All of these various “packages” of information can be considered to be
information systems (micro and macro) created to achieve some purpose.
They may also be considered to be (micro and macro) communications sys-
tems, so that the information in them can be satisfactorily transferred: from
the package to someone who wants the information or from one package to
another package. However, all communications systems have potential prob-
lems. Information science seeks to analyze, design, and evaluate these sys-
tems in order to understand and improve how they function.
4. We organize it (in our heads and in our files) and may make new
information.
For centuries people have noted (or complained) that there is too much
information in the world. In 1755 French encyclopedist Denis Diderot wrote
4 Introduction to Information Science and Technology
4. Did it save money in the short run and the long run?
• Accuracy
• Timeliness
• Completeness
• Ease of understanding
• Trustworthiness of source
From the perspective of the legal research service Virtual Chase (2008a),
the following criteria are valuable for assessing the quality of information:
References
Bohn, R. E., & Short, J. E. (2009). How much information? 2009 report on American con-
sumers. San Diego, CA: Global Information Industry Center, University of California–San
Diego. Retrieved November 11, 2010, from hmi.ucsd.edu/pdf/HMI_2009_Consumer
Report_Dec9_2009.pdf.
Diderot, D. (1975–1995). Encyclopédie. In Oeuvres complètes (H. Dieckmann et al., Eds.) (vol.
7, pp. 174–262). Paris: Hermann. Original work published in 1755.
Our World of Information 7
Neuman, W. R., Park, Y. J., & Panek, E. (2010). Tracking the flow of information into the home:
An empirical assessment of the digital revolution in the U.S. from 1960–2005. Ann Arbor,
MI: University of Michigan. Retrieved November 11, 2010, from www.wrneuman.
com/Flow_of_Information.pdf.
Neuman, W. R., & Pool, I. S. (1986). The flow of communications into the home. In S. Ball-
Rokeach & M. Cantor (Eds.), Media, audience and social structure (pp. 71–86). Beverly
Hills, CA: Sage.
Pool, I. S. (1983). Tracking the flow of information. Science, 211, 609–613.
Pool, I. S., Inose, H., Takasaki, N., & Hurwitz, R. (1984). Communications flows: A census in the
United States and Japan. Amsterdam: Elsevier North Holland.
Popper, K. (1979). Objective knowledge: An evolutionary approach (rev. ed.). New York: Oxford
University Press.
Toffler, A. (1970). Future shock. New York: Random House.
Virtual Chase. (2008a). Criteria for quality of information. Retrieved November 11, 2010,
from virtualchase.justia.com/quality-criteria-checklist.
Virtual Chase. (2008b). How to evaluate information—Checklist. Retrieved November 11,
2010, from virtualchase.justia.com/how-evaluate-information-checklist.
Wurman, R. S. (1989). Information anxiety. New York: Doubleday.
CHAPTER 2
Foundations of Information
Science and Technology
9
10 Introduction to Information Science and Technology
science means, and the chapter concludes with an account of the field’s intel-
lectual and historical roots.
Data
Data is the plural of datum, derived from Latin dare (to give); hence, data is
“something given.” Some style manuals insist that data be used only in the
plural; it may, however, be used as a collective noun: a plural noun used in
the singular to denote a set of items. Machlup (1983) wrote:
Foundations of Information Science and Technology 11
We may conclude that what is considered data is relative: What some con-
sider the given (or input), others may consider the output. From the per-
spective of information science, it is important to represent and
communicate not just data but also its background, its reception, and the
theoretical assumptions connected with data, which makes the concepts of
knowledge and document important.
12 Introduction to Information Science and Technology
Knowledge
The classical definition goes back to Plato: Knowledge is verified true belief.
This definition is problematic, however, because knowledge is always open to
modification and revision, so that very little (or nothing) can be considered
knowledge in Plato’s sense. This is why pragmatic and materialist theories
consider the concept of knowledge in relation to human practice: Knowledge
expands the actors’ possibilities to act and to adjust to the world in which they
live. Pragmatism and materialism consider human practice the final criterion
of knowledge and see experimentation as an integrated component.
The Oxford English Dictionary definitions of knowledge include 1) “skill or
expertise acquired in a particular subject ... through learning;” 2) “that which
is known;” and 3) “being aware or cognizant of a fact, state of affairs, etc.”
(OED Online, “knowledge”).
For example:
for their handling. For both archives and libraries, the custodian of the doc-
uments was usually a noted scholar in one or more subjects and only mini-
mally concerned with the development of principles regarding management
and use. The Encyclopaedia Britannica (“Library,” 2010) puts it concisely:
“Although the traditional librarian acted primarily as a keeper of records, the
concept of an active service of advice and information eventually appeared
as a legitimate extension of the role of custodian.”
Just as rudimentary rules and methods for arranging and describing
library and archival materials had begun to emerge prior to the 18th century,
so had the art of bibliography, usually with attention to completeness in
some subject or type of materials. Conrad Gesner’s (1516–1565) Bibliotheca
Universalis (published in 1545) provides an excellent example. Gesner devel-
oped and expanded general principles of inclusion, arrangement, and index-
ing (Jackson, 1974). Indexing principles, or at least techniques, had even
earlier origins, particularly in dealing with sacred texts. Most of these early
techniques and practices were well described in Gabriel Naudé’s (1600–1653)
Advice on Establishing a Library (1627), a work that became influential in its
time on issues regarding bibliography, library management, classification,
cataloging, and related topics.
The bibliographic “urge” of Gesner and others was matched by the desire
of scholars to understand and organize all of human knowledge. Plato and
Aristotle were likely the first to state this goal; during the 17th and 18th cen-
turies, many others began work on such endeavors. A classification system of
the world’s knowledge and the draft of an encyclopedia to contain it, pro-
duced in 1620 by Francis Bacon (1561–1621), were foundations for later
developers. Gottfried von Leibnitz (1646–1716), librarian, philosopher, math-
ematician, and logician, followed in Bacon’s footsteps with his classification
system, again with attention to organizing all of the world’s knowledge.
Others, such as Denis Diderot (1713–1784) and Jean d’Alembert (1717–1783),
continued this work (“Library,” 2010). The idea of compiling, classifying, and
making available the world’s knowledge would later greatly influence those
in library science, documentation, and information science.
By the 18th century, a number of advances in the library arts began to
appear: national and subject bibliographies, printed library catalogs, new
schemes for subject arrangement of materials on shelves, and principles for
bibliography. Also, the first museums were opened to the public in this
period. Formerly, these commonly called “cabinets of curiosities” were the
private collections of royalty. The British Museum opened to the public in
1753 and was soon followed by the Louvre Museum in Paris in 1793.
The rapid expansion of the number of libraries of all types during the early
19th century led to more extensive writings about libraries and library man-
agement. Martin Schrettinger (1772–1851), a German librarian, was the first
person to use the term library science; in 1808 he employed the term
Foundations of Information Science and Technology 19
In the U.K., Anthony Panizzi, librarian for the British Museum, developed
91 rules for author-title entries, bringing consistency to cataloging work.
These rules, first published in 1841, continued to influence all developers of
library catalogs through the remainder of the 19th century. In 1877, the
Library Association of the United Kingdom was founded after an interna-
tional conference of librarians.
In Europe, the École des Chartes, founded in France in 1821, improved the
formal education and training of librarians, archivists, and bibliographers.
Karl Franz Otto Dziatzko (1842–1903), a professor of library science at the
20 Introduction to Information Science and Technology
In the late 1800s and early 1900s, the number of museums, particularly in
the U.S., expanded, and a professional cadre of museum specialists devel-
oped to support them. John Cotton Dana, director of the Newark Public
Library, was a pioneer in this area. In 1925 Dana, one of the founders of the
Special Libraries Association in the U.S., established the Newark Museum in
the same building as the library and began an apprenticeship program that
would foster in museum curators a broad knowledge, including familiarity
with library processes. A key part of the expertise of these curators was the
development of exhibitions and loan programs that met the needs of the
local population (Given & McTavish, 2010).
By 1934, Otlet had systematized most of his ideas about the science of
documentation into his text, Traité de documentation (Otlet, 1934/1980). For
Otlet, documentation represented “the means of bringing into use all of the
written or graphic sources of our knowledge” (as cited in Rayward, 1997, p.
299), as well as a new intellectual discipline. Although some scholarly librar-
ians objected, the documentation idea had considerable influence in
Europe; it was essentially unknown elsewhere in the world. Special librarians
in the U.S. and the U.K. did pick up some of these ideas and approaches (par-
ticularly in the detailed indexing of a wide variety of documentary materials
and specialized information services; Muddiman, 2005; Williams, 1997). The
European documentation movement influenced Watson Davis, who founded
the American Documentation Institute in 1937 to study the problems of the
distribution of scientific information.
After World War II, Suzanne Briet, director of studies at the National
Institute for Techniques in Documentation in Paris, expanded Otlet’s ideas by
defining documents broadly, to include much more than text. Buckland
(2009) describes her views as “semiotic,” treating documents as “indexical
signs exposing an unlimited horizon of networks of techniques, technolo-
gies, individuals, and institutions” (p. 79).
In the early part of the 20th century, library science gradually gained
acceptance as the preferred term to describe the study of the management of
libraries and library services. University-level educational programs along
the Columbia University model were established at several other institutions
even though the predominant training approach remained within libraries.
Charles C. Williamson’s study of library education, published in 1923,
strongly recommended moving library education to universities (Davis,
1994); in the U.S. this move was largely complete by the 1950s. Williamson’s
report also emphasized the importance of research in meeting the challenges
of library science; in 1926 the University of Chicago established the first PhD
program in library science.
As the Graduate Library School at Chicago took shape, the faculty debated
the focus of the program. Pierce Butler’s small booklet An Introduction to
Library Science (1933/1961) emphasized the philosophical principles of
22 Introduction to Information Science and Technology
librarianship rather than its scientific aspects. Between 1950 and 1970, edu-
cation for librarianship moved from the expectation of an undergraduate
degree to a requirement for a master’s as the first professional degree. The
establishment of the U.S. National Archives in 1934 gave impetus to the
archival community’s development of an archival science.
The explosion of scientific documentation during and after World War II
challenged librarians and the scientific community. Scientists, dissatisfied
with what they perceived as slow and cumbersome cataloging and classifica-
tion processes and ineffective retrieval methods, introduced new technolo-
gies: punched cards and, later, computers. Scientists and special librarians
collaborated in this work. Principles and techniques for coordinate indexing
and searching were developed and adopted in many libraries and informa-
tion centers. Automatic indexing and abstracting, machine translation, and
remote searching of databases were tried, with both successes and failures.
Initially, these pioneers called themselves scientific information specialists,
but this cumbersome title was inadequately descriptive of what they were
trying to do: develop a science of information.
A rich outpouring of new retrieval systems, new technologies for storage,
and new ways of subject control took place between 1950 and 1970, much of
it funded by the National Science Foundation, the U.S. Air Force, and other
government agencies. Systems were developed that provided automatic
indexing, machine translation, thesaurus construction, retrieval effective-
ness, citation indexing, and online retrieval. These experiments and systems
brought new participants to the new field, people with backgrounds in com-
puter science, linguistics, behavioral sciences, mathematics, and communi-
cations. Textbooks for the new field called it information retrieval,
information storage and retrieval, or information science. Documentation or
information science was influenced by Vannevar Bush’s idea of Memex
(Buckland, 2006) and the information theory work of Claude Shannon,
Warren Weaver, and Norbert Wiener on telecommunications systems.
Definitions of this new field were offered (borrowed largely from the
European documentalists) (e.g., Borko, 1968; Shera & Egan, 1950; Simon,
1947; Tate, 1950) but no definition seemed to suit everyone (Lipetz, 2005).
References
Ackoff, R. L. (1989). From data to wisdom. Journal of Applied Systems Analysis, 16, 3–9.
Bateson, G. (1972). Steps to an ecology of mind. New York: Ballantine.
Bauer, W. F. (1996). Informatics and (et) Informatique. IEEE Annals of the History of
Computing, 18(2). Retrieved November 11, 2011, from www.softwarehistory.org/history/
Bauer1.html
Borko, H. (1968). Information science: What is it? American Documentation, 19(1), 3–5.
Braganza, A. (2004). Rethinking the data-information-knowledge hierarchy: Towards a case-
based model. International Journal of Information Management, 24(4), 347–356.
Briet, S. (1951). Qu’est-ce que la documentation? Paris: Documentaires Industrielles et
Techniques.
Buckland, M. (1991). Information and information systems. New York: Greenwood Press.
Buckland, M. (2006). Emanuel Goldberg and his knowledge machine: Information, invention,
and political forces. Westport, CT: Libraries Unlimited.
Buckland, M. (2009). As we may recall: Four forgotten pioneers. ACM Interactions, 16(6),
76–79.
Butler, P. (1961). An introduction to library science. Chicago: University of Chicago Press.
(Original work published 1933)
Case, D. O. (2007). Looking for information: A survey on information seeking, needs, and
behavior. London: Academic Press.
Davis, D. G. (1994). Education for librarianship. In W. A. Wiegand & D. G. Davis (Eds.),
Encyclopedia of library history (pp. 184–186). New York: Garland.
Ellis, D. (1992). The physical and cognitive paradigms in information retrieval research.
Journal of Documentation, 48(2), 45–64.
Encyclopaedia. (2010). Encyclopædia Britannica. Retrieved February 18, 2010, from
www.britannica.com/EBchecked/topic/186603/encyclopedia.
Given, L. M., & McTavish, L. (2010). What’s old is new again: The reconvergence of libraries,
archives and museums in the digital age. Library Quarterly, 80(1), 6–30.
Hayes, R. M. (1994). Information science and librarianship. In W. A. Wiegand & D. G. Davis
(Eds.), Encyclopedia of library history (pp. 275–280). New York: Garland.
Hjerppe, R. (1994). A framework for the description of generalized documents. Advances in
knowledge Organization, 4, 173-180.
Foundations of Information Science and Technology 25
Redmond-Neal, A., & Hlava, M. K. (2005). ASIS&T thesaurus of information science, technol-
ogy, and librarianship (3rd ed.). Medford, NJ: Information Today for the American Society
for Information Science and Technology.
Reitz, J. M. (2007). Online dictionary for library and information science. Westport, CT:
Libraries Unlimited. Retrieved November 11, 2010, from lu.com/odlis.
Shannon, C. E., & Weaver, W. (1964). The mathematical theory of communication. Urbana:
University of Illinois Press. (Original work published 1949)
Shera, J. H. & Egan, M. E. (1950). Documentation in the United States. American
Documentation, 1(1), 8-12.
Simon, E. N. (1947). A novice on “documentation.” Journal of Documentation, 2(2), 238–241.
Spang-Hanssen, H. (2001). How to teach about information as related to documentation.
Human IT, 1(1), 125–143. Retrieved November 11, 2010, from www.hb.se/bhs/ith/1-
01/hsh.htm.
Tate, V. D. (1950, January). Introducing American Documentation. American Documentation,
1(1), 3.
Williams, R. V. (1997). The documentation and special libraries movement in the United
States, 1910–1960. Journal of the American Society for Information Science, 48(9),
775–781.
WordNet. (2006). Princeton, NJ: Princeton University. Retrieved April 21, 2008, from wordnet.
princeton.edu/perl/webwn?s=information+science.
CHAPTER 3
Information Needs,
Seeking, and Use
27
28 Introduction to Information Science and Technology
Active Passive
multiple methods of data collection and extend their investigations for a long
enough time that (nearly) all contextual issues can emerge.
Activity theory can provide a holistic view of information practices in
which the individual subjects and their collective relationships, the objects
used, and the tools or technology employed are treated as equally important
and in which situated and historical context is taken seriously. Activity theory
stresses the development of cognition as a unity of biological development,
cultural development, and individual development. It has a strong ecological
and functional-historical orientation. It focuses on the activity of the subject
and the object orientation of this activity. Hjørland (1997) noted that activity
theory “stresses the ecological and social nature of meaning. … A person’s use
of a term may be determined not by his individual usage, but by the usage of
some social group to which he semantically defers. Therefore, the content of
a person’s thoughts are themselves in part a matter of social facts” (p. 81).
information preferences. The bias exists both before and after a decision has
been made. It is evident in political decision making, where voters’ prefer-
ences are often reflected in the news sources they use. When preparing a legal
case, attorneys seek information that supports their argument; they also seek
contradicting information in order to be aware of how the other side might
present its case. Information professionals should not assume that people
are neutral or objective when seeking information.
Schultz-Hart and colleagues noted that heterogeneous groups in which a
minority has an “alternative” view tend to seek information more broadly
and develop reports that reflect both views. The authors note that conver-
gence and divergence draw on different cognitive and motivational
processes. Divergence involves scrutinizing the problem and assessing the
alternatives available, whereas convergence involves committing oneself to a
particular perspective and upholding it against opposing forces. The under-
lying model of optimal decision making determines the approach used.
Sometimes it is better to examine the alternatives deeply; in such cases, con-
firmation bias is problematic. In other cases, it is important to reach a deci-
sion quickly; in such cases, conformation bias is functional.
Dervin’s Sense-Making
Brenda Dervin’s (1999) approach, although not strictly a model, provides a
useful perspective on how interpretative or naturalistic research methods
can be used to study information seeking (Case, 2007). Dervin criticized
approaches that view information as an objective entity that exists apart
from humans; she holds that information is not a brick that can be used to fill
human “buckets” needing information. Instead, individuals construct infor-
mation as they face gaps in their understanding of the world. Gaps occur in
situations that are unique to the individual; bridging these gaps requires the
individual to construct information, to make sense of the situation. Different
people will perceive gaps (and bridges) differently. Research using this para-
digm tends to focus on emotional (affective) perspectives as well as cognitive
concerns.
1. Demographic characteristics
3. Information needs
4. Thoughts (cognition)
5. Feelings
8. Information sources
9. Outcomes
Information Literacy
Information literacy traces back to libraries’ interest in bibliographic instruc-
tion, later known as user education. Grassian (2004) notes that bibliographic
instruction is based in the physical library and is tool based, focuses on the
mechanics of using those tools, and is tied to course assignments.
Information literacy, however, has no physical constraints, is concept-based,
Information Needs, Seeking, and Use 37
helps people learn how to learn, and supports learning outcomes of aca-
demic programs. The American Library Association and Association for
College and Research Libraries (2000) have adopted standards for informa-
tion literacy:
Digital Literacy
The term digital literacy was used in the 1980s, generally to mean the ability
to deal with hypertextual information (in the sense of computer-supported,
non-sequential reading) (Bawden, 2001). Gilster (1997) expanded the con-
cept of digital literacy in his book of the same name. Rather than a set of
skills, competencies, or attitudes, Gilster viewed digital literacy as an ability
to understand and use information from a variety of digital sources—it is
simply literacy in the digital age: the ability to read, write, and otherwise deal
with information using the technologies and formats of the time. Other
authors have used digital literacy to denote a broad concept linking together
other relevant literacies and those based on communication technology
competencies and skills, but they have focused on “softer” skills of informa-
tion evaluation and knowledge assembly, together with a set of understand-
ings and attitudes (Bawden, 2008; Martin, 2006, 2008).
In summary, we can say that digital literacy is the set of attitudes, under-
standings, and skills to handle and communicate information and knowl-
edge effectively in a variety of media and formats. Some definitions include
communicating; those with a records management perspective mention
deleting and preserving. Sometimes the resolution is sharper, with finding
broken down into subprocesses such as choosing a source, retrieving, and
accessing. In an age when information comes mainly in digital form, digital
literacy would seem essential; however, it must be adopted with the caveat
that an important part of digital literacy is knowing when to use a nondigital
source.
Digital literacy in this sense is a framework for integrating various other
literacies and skill sets, although it does not need to encompass them all; as
Martin (2006) put it, we do not need “one literacy to rule them all” (p. 18).
Although it might be possible to produce lists of the components of digital lit-
eracy and show how they fit together, it is not sensible to try to reduce it to a
finite number of linear stages. Nor is it sensible to suggest that one specific
model of digital literacy will be appropriate for all people, or indeed for one
person over a lifetime. Updating of understanding and competence will be
Information Needs, Seeking, and Use 39
1. Underpinnings
• Literacy per se
2. Background knowledge
This kind of knowledge was assumed for any educated person in the days
when information came in books, newspapers, magazines, academic jour-
nals, professional reports, and not much else and was largely accessed
through physical print-on-paper libraries. The well-understood publication
chain—from author to archivist, passing through editors, publishers, book-
sellers, librarians, and the rest—lasted as a sensible concept well into the
computer age. Now it seems outdated, and there is no clear model to replace
it. Nonetheless, gaining as good an understanding as possible of what the
new forms of information are and where they fit into the world of digital
information has to be an essential start in being digitally literate.
3. Central competencies
• Evaluation of information
• Knowledge assembly
• Information literacy
• Media literacy
These are the skills and competencies, building on the basic underpin-
nings, without which any claim to digital literacy has to be regarded skepti-
cally. They are remarkably wide-ranging, and it would be sobering to try to
assess to what degree they are possessed in the various countries of the world.
• Independent learning
These attitudes and perspectives are perhaps what create the link
between the new concept of digital literacy and an older idea of literacy, in
vogue more than 200 years ago. It is not enough to have skills and compe-
tencies; they must be grounded in some moral framework, strongly associ-
ated with being an educated, or as our ancestors would have said, a “lettered”
person. Of all the components of digital literacy, a moral framework may be
the most difficult to teach or inculcate, but it comes closest to living up to the
meaning of information in its Latin root informare—the transforming, struc-
turing force.
Independent learning and moral and social literacy are the qualities
attributed to a person with the motivation and mind-set to make best use of
information. They provide the basis for understanding the importance of
information and of “right dealing” with information resources and commu-
nication channels, as well as the incentive to continue to improve one’s
capabilities.
Taken together, these four components may seem to present a very ambi-
tious set of competencies and attitudes to demand of anyone. Yet they seem
to be what is needed to cope and to succeed in today’s information environ-
ment. In particular, this form of digital literacy is a powerful aid in avoiding a
number of the problems and paradoxes of information behavior—informa-
tion overload, information anxiety, information avoidance, and the like
(Bawden & Robinson, 2009).
References
American Library Association and Association for College and Research Libraries. (2000).
Information literacy competency standards for higher education. Retrieved November 11,
2010, from www.ala.org/ala/mgrps/divs/acrl/standards/standards.pdf.
Information Needs, Seeking, and Use 41
Bates, M. J. (1989). The design of browsing and berrypicking techniques for the online search
interface. Online Review, 13(5), 407–424.
Bates, M. J. (2002, September). Toward an integrated model of information seeking and
searching. Keynote speech at the Fourth International Conference on Information Needs,
Seeking and Use in Different Contexts, Lisbon, Portugal. Retrieved November 11, 2010,
from www.gseis.ucla.edu/faculty/bates/articles/info_SeekSearch-I-030329.html.
Bates, M. J. (2005). Berrypicking. In K. E. Fisher, S. Erdelez, & L. McKechnie (Eds.), Theories of
information behavior (pp. 58–62). Medford, NJ: Information Today.
Bawden, D. (2001). Information and digital literacies: A review of concepts. Journal of
Documentation, 57(2), 218–259.
Bawden, D. (2008). Origins and concepts of digital literacy. In C. Lankshear & M. Knobel
(Eds.), Digital literacies: Concepts, policies and paradoxes (pp. 15–32). New York: Peter
Lang.
Bawden, D., & Robinson, L. (2009). The dark side of information: Overload, anxiety and other
paradoxes and pathologies. Journal of Information Science, 35(2), 180–191.
Belkin, N. J. (2005). Anomalous state of knowledge. In K. E. Fisher, S. Erdelez, & L. McKechnie
(Eds.), Theories of information behavior (pp. 44–48). Medford, NJ: Information Today.
Bruner, J. (1990). Acts of meaning: Four lectures on mind and culture. Cambridge, MA:
Harvard University Press.
Case, D. O. (2007). Looking for information: A survey of research on information seeking,
needs, and behavior (2nd ed.). Amsterdam: Elsevier.
Case, D. O., Andrews, J. E., Johnson, J. D., & Allard, S. L. (2005). Avoiding versus seeking: The
relationship of information seeking to avoidance, blunting, coping, dissonance, and
related concepts. Journal of the Medical Library Association, 93(3), 353–362.
Catts, R., & Lau, J. (2008). Towards information literacy indicators. Paris: UNESCO.
Dervin, B. (1999). On studying information seeking methodologically: The implications of
connecting metatheory to method. Information Processing & Management, 35(6),
727–750.
Ellis, D. (2005). Ellis’s model of information-seeking behavior. In K. E. Fisher, S. Erdelez, & L.
McKechnie (Eds.), Theories of information behavior (pp. 138–142). Medford, NJ:
Information Today.
Erdelez, S. (1999). Information encountering: It’s more than just bumping into information.
Bulletin of the American Society for Information Science, 25(3). Retrieved November 11,
2010, from www.asis.org/Bulletin/Feb-99/erdelez.html.
Fisher, K. E., Erdelez, S., & McKechnie, L. (Eds.). (2005). Theories of information behavior.
Medford, NJ: Information Today.
Gilster, P. (1997). Digital literacy. New York: Wiley.
Grassian, E. (2004) Building on bibliographic instruction. American Libraries, 35(9), 51–53.
Hjørland, B. (1997). Information seeking and subject representation: An activity-theoretical
approach to information science. Westport, CT: Greenwood Press.
42 Introduction to Information Science and Technology
Representation of Information
43
44 Introduction to Information Science and Technology
“Call me a taxi,” says the first speaker. “Okay,” says the second,
“you’re a taxi.”
This old joke exemplifies the inability to distinguish between a direct and
an indirect object in English (except by context). Other languages (Latin and
German, for example) use an accusative case to identify direct objects and a
Representation of Information 45
dative case for indirect objects. There can be no ambiguity, but neither can
there be the same kind of humor—something of a trade-off. The dative and
accusative cases are also used to differentiate between static and dynamic
situations; in German, for example, “das Buch liegt auf dem Tisch” means
“the book lies (or is lying) on the table” but “ich lege das Buch auf den Tisch”
means “I lay (or am laying) the book on the table.” The case is indicated by
the form of the article following the preposition. Perhaps the reason many
Americans have trouble distinguishing between lie and lay is the lack of
inflection (i.e., ways to distinguish the case of a noun) in the language. Some
have suggested that English speakers may also be reluctant to use the word
lie because of its look-alike’s meaning, prevarication.
German, the principal Teutonic language most closely related to English,
has four cases rather than three: nominative, genitive, dative, and accusative.
Nominative is virtually the same in each language, dative and accusative
have been identified briefly already, and genitive is equivalent to possessive.
Latin is even more complicated, having seven cases: nominative, genitive,
dative, accusative, ablative, vocative, and locative. The ablative case primarily
preserves distinctions between and among different kinds of prepositional
phrases. (For a taste of the importance possessed by Latin prepositions, con-
sider that de implies down from, ex means out of, and ab suggests away from.
Their impact on derivatives in English is profound: descend [climb down
from], extract [draw out of], absurd [away from the rational].) Languages such
as Latin are called inflected because the use of the words can be inferred from
their form rather than their position in a sentence.
English and other modern languages often represent something of a
hybrid between inflected and uninflected types. In addition, usage is con-
stantly changing. American English differs substantially from British
English—partly the result of geographic separation and partly the result of
changing roles in the world. Both the U.S. and the U.K. have absorbed mil-
lions of immigrants from an extraordinary variety of countries. However,
modern commerce and communication technologies have forced the inter-
action of all peoples as never before. Naturally, this has led to a considerable
sharing of terms and may be causing greater linguistic homogeneity—not
just among the English-speaking peoples, but globally.
Semantic ambiguity causes difficulty, whether the language is written or
spoken. One needs to distinguish between homographs (words or other sym-
bols that are written the same way but have different meanings) and homo-
phones (words that sound the same but mean different things). The generic
term is homonym, for words with different meanings but the same spelling or
pronunciation. Although they are usually harmless when written (but
embarrassing, as when one uses there for their, for instance), homophones
can create considerably more confusion in conversations (“The oar/ore will
not float.”).
46 Introduction to Information Science and Technology
Machine Translation
In 1947, Warren Weaver became one of the first to propose that computers of
the day, which were essentially large calculating machines, could eventually
be used to translate text from one language to another. In a letter to cyber-
neticist and linguist Norbert Wiener, Weaver wrote, “When I look at an article
in Russian I say, ‘This is really written in English, but it has been coded in
some strange symbols. I will now proceed to decode’” (Hutchins, 1997, p. 22).
Taken at face value, the implication is that machine translation should be
a trivial matter of employing a substitution cipher, replacing, for example, a
Russian word by its English equivalent. However, as Hofstadter (1997, p. 521)
has noted, it is more likely that Weaver, who was fascinated by the subtleties
of language, was merely trying to be provocative.
The difficulties of getting a computer to translate passages should be
apparent, but successive generations of programmers have been unsuccess-
ful. In addition to commercially available software for various natural lan-
guages, AltaVista, Babelfish, Google Language Tools, and other translation
Representation of Information 47
services are currently available on the World Wide Web. Such services work
best with short, unambiguous passages but may produce inadequate trans-
lations of complicated text.
Hofstadter (1997) illustrated the pleasures and pitfalls of human transla-
tions of poetry, then remarked about machine translation:
• For some reason, the author chooses to share the idea with others.
To do so successfully, the idea must be represented in such a way
that the person receiving the message in which it is contained is
able to understand the meaning. This may be through movement,
sound, or image, but more commonly the information is expressed
in language.
in the 1950s as a generic term for the study of all kinds of information objects,
including texts, pictures, sounds, and objects.
Analysis is the first step in providing intellectual access to documents.
Technical (computer-based) approaches to analyzing the subject or informa-
tion content of a document may focus on microstructure or macrostructure.
Microstructure analysis begins with syntactic analysis of sentences (parsing)
and paragraphs, followed by semantic analysis; macrostructure analysis uses
methods such as data extraction (conversion of knowledge from the textual
format into a more structured representation, such as frames) A variety of
theoretical perspectives have been developed to explain how to analyze doc-
uments or information objects.
Content Analysis
Content analysis focuses on the actual content of texts and seeks to describe
their meaning objectively and systematically, within the context of commu-
nication. Krippendorff (2004) defines content analysis as “a research tech-
nique for making replicable and valid inferences from texts (or other
meaningful matter) to the contexts of their use” (p. 18). To make such extrap-
olations, however, the context of the texts’ creation is considered first.
The subject matter of the text is less important here than the recurrent
themes and concepts it contains. The process of content analysis involves
identifying the structures of, and patterns within, a text, which can then be
used as a basis for making inferences to other states or properties
(Krippendorff, 2004). Texts can be characterized and categorized by their
50 Introduction to Information Science and Technology
Contextual Analysis
Context itself is a kind of text that can be interpreted, thus manifesting the
shared concepts and meanings from which texts are constructed. Contextual
analysis is a foundation for discourse, hermeneutic, and semiotic analysis.
Analysis considers the traditions, customs, and practices in which com-
munication occurs; these may be considered from the perspective of cul-
tural, social, ideological, organizational, disciplinary, epistemological, and
professional strata. Boundaries between strata are not always clear because
they influence each other. These strata are used to consider the texts, their
creators, genres of communication, semiotic representation of concepts, and
disciplinary discourses.
The content of a text is considered a social phenomenon shaped by its
context, which includes the circumstances under which the text was created
and its relationship with other texts. A text can be considered as a node in a
network. Barthes (1977) sees a text as “a methodological field” that “exists in
the movement of a discourse”(p. 156). Individual texts are involved in an
intertextual and material weave that forms their context: Any text is thus an
embodiment of connections across a literature base, linking various works,
and displaying and articulating a particular discourse. Barthes (1977)
describes the text as follows:
Discourse Analysis
Discourse analysis seeks to identify the implicit assumptions in the use of
language by identifying the relationships among a text, its discursive prac-
tices, and the larger social context that shapes both text and discourse.
Discourse analysis assumes that the resources and strategies (lexis and gram-
mar, rhetorical formations, typical cultural narratives, genres, the principles
of constructing thematic formations, etc.) used in producing texts are char-
acteristics of a community rather than unique to a discursive event in that
community.
Pure semiotics does not encompass institutional frameworks or the social
context of a text; these are examined in discourse analysis, sometimes known
as social semiotics. In social semiotics, discourse is seen as involving the
larger linguistic unit, or the text as a whole, and discourse analysis examines
how meaning is made within a text by identification of the characteristics of
the community that creates and uses the text. This method of analysis assists
in discovering recondite meaning: It is broader, and yet more penetrating,
than content analysis because the hidden or encrypted meanings of the texts
are examined in order to discover intended or unintended social or political
effects.
Language use within a discourse effectively constitutes social practice,
action, and interaction, playing a role in the construction of the reality of a
particular community—its identities, social relations, and power struggles.
Language constructs reality or particular versions of an experienced world,
positioning the reader to subscribe to particular beliefs, where some “truths”
are accepted and others rejected.
Within a particular community, the use of certain metaphors, symbolic
characterizations, and other assumptions can become naturalized (cf.
Barthes’s, 1994, “natural information”) and are considered unproblematic,
even transparent or “given.” This naturalization, however, privileges certain
attitudes or positions and, in so doing, precludes others. It is the task of dis-
course analysis to reveal the institutions and practices of power behind such
naturalizations; this is accomplished by revealing the motivations and poli-
tics behind assumptions, or conceptualizations of an entity, and facilitating
open and informed debate.
Hermeneutic Analysis
Hermeneutics provides the philosophical grounding for interpretivism
because it is concerned with the discovery of meaning and coherence, par-
ticularly in texts, and how prior understandings and prejudices shape the
interpretive process. Hermeneutics can be treated as both an underlying phi-
losophy and a specific mode of analysis (Bleicher, 1980). As the latter, it pro-
vides a way of understanding textual data, in which the parts or the whole of
52 Introduction to Information Science and Technology
Semiotic Analysis
Semiotics, literally the study of signs, seeks to establish the meanings of
terms (or signifiers) used in a text, how meaning is determined, and conse-
quently, how social reality is created and shared within a community.
Content and discourse analysis can be considered types of semiotic analysis.
Semiotic analysis considers the epistemological, theoretical, or disciplinary
positions of the texts.
Texts are most often read only superficially, and the meaning of terms
used is assumed at a lexical level. However, Barthes (1994) argues that by
accepting terms at face value, “we take them for ‘natural’ information” (p.
158), not considering any hidden meanings they may suggest. Semiotics pre-
sumes that signifiers are located within the conventions of particular social
and cultural contexts and that terms therefore have ontological meanings
unique to a particular community. The various levels of semiotic analysis
facilitate a deeper understanding of signs within a context because they are
able to expose the conventions at work within a particular text and to reveal
the implied or understood meanings of terms.
Barthes (1994) maintains that the first task of the semiological undertak-
ing is to divide the texts into minimal significant units. The importance of an
idea is revealed by the frequency with which a term signifying that idea
appears in the text.
Representation of Information 53
Semiotic codes are sets of practices that are familiar to a particular cul-
tural community—these assist in communication of meaning within the
community. Codes are themselves signifying systems, providing conventions
for organizing signs into systems that correlate signifiers and signifieds;
codes also transcend single texts by linking them into an interpretative
framework. Knowledge of the framework or code is necessary for both the
author and the reader of a text: They both provide and constrain meaning.
Typically, external clues indicate what code is being employed; examples of
such clues include the text’s layout and progression of introduction, refer-
ences, and tables.
What a term represents is considered and interpreted by the members of
a community and matched against their mental models and ontological
commitments. Terms may be used metaphorically or symbolically within the
framework of a particular code, but such interpretations can be made clear
only through consideration of the context of their use. Coding systems can be
used effectively only if their users know how they are constructed and for
what purposes.
4.4. Abstracting
The international standard ISO 214-1976 defines the abstract as “an abbrevi-
ated, accurate representation of the contents of a document, without added
interpretation or criticism and without distinction as to who wrote the abstract”
(International Organization for Standardization, 1976, p. 1). According to the
54 Introduction to Information Science and Technology
• Purpose
• Methods
• Results
• Conclusions
Examples of Abstracts
Two examples of abstracts are presented here. They are taken from Colorado State
University’s Writing@CSU Project (writing.colostate.edu/guides/documents/
abstract/pop2c.cfm).
Descriptive abstract:
“Bonanza Creek LTER [Long Term Ecological Research] 1997
Annual Progress Report.”
We continue to document all major climatic variables in the
uplands and floodplains at Bonanza Creek. In addition, we have
documented the successional changes in microclimate in nine
successional upland and floodplain stands at Bonanza Creek
(BNZ) and in four elevational locations at Caribou-Poker Creek
Representation of Information 57
4.5. Indexing
Indexing is the representation of a document (or part of a document or an
information object) in a record or an index for the purpose of retrieval.
Library catalogs, bibliographical databases, and back-of-the-book indexes
are common examples.
58 Introduction to Information Science and Technology
The representation may identify the originators of the document, its pub-
lisher, its physical properties, and its subjects. Descriptive indexing empha-
sizes physical properties such as originator, publisher, and date and place of
publication; subject indexing emphasizes what the document is about.
Subject indexes may be prepared by human intellectual analysis or by means
of computer-based statistical analyses of word frequencies. After analysis to
determine the document’s subjects, the subjects are “translated” into index-
ing terms (or other symbols, such as classification codes). Many indexes use
natural language, often words taken from the document. In other cases the
indexing is represented by controlled vocabulary or classification (see
Chapter 5).
Keyword Indexing
Keyword indexing extracts words from a document in order to describe its
subject. Early work in this area was conducted in 1958 by H. P. Luhn at the
IBM research library. He created the keyword in context (KWIC) index,
using titles of research articles. The keywords formed a column near the
middle of the page, and portions of the author and title provided a snippet
of context (Figure 4.1). Luhn soon reformatted the output to produce a key-
word out of context (KWOC) index (Figure 4.2). Luhn’s work was one of the
earliest applications of computers for text processing (pp. 288–295).
Stevens, M.. (1970) * indexing* A state-of-the-art report. National Bureau [document 70-874]
Doyle, L. B. (1975). * Information* retrieval and processing. Los Angeles: [document 75-005]
Keenan, S. (1973). C * information* data centers (pp. 97-104). Montvale, N [document 73-173]
Luhn, H. P. (1959). * information* retrieval systems. In M. Boaz, Modern [document 59-302]
E. Tomeski, R. W. W * information* storage & retrieval proceedings of Feb [document 61-043]
Taube, M., & Woost * Information* storage and retrieval: Theory, systems [document 58-306]
Luhn, H. P. (1961). * intelligence* systems: Some basic problems and pre [document 61-659]
Indexing
70-874 Stevens, M. (1970). Automatic indexing: A state-of-the-art report. National Bureau of
Information
75-005 Doyle, L. B. (1975). Information retrieval and processing. Los Angeles: Melville.
73-173 Keenan, S. (1973). Progress in automatic indexing and prognosis for the future. In J. A.
59-302 Luhn, H. P. (1959). Auto-encoding of documents for information retrieval systems. In M.
61-043 Tomeski, E. R. & W. Westcott (Eds.) (1961). The clarification, unification &
58-306 Taube, M., & Wooster, H. (Eds.). (1958). Information storage and retrieval: Theory,
Intelligence
61-659 Luhn, H. P. (1961). Automated intelligence systems: Some basic problems and
prerequisit
Citation Indexing
The works cited in a scholarly document also provide clues to what the doc-
ument is about (its subject). Citation indexes are databases of documents’
bibliographic references; they allow searchers to search by citation chaining,
locating newer documents that cite an older one. They also allow forward
chaining (locating newer documents that cite the one in hand), as well as
documents that have citations in common.
Citation indexes have been used for generations in legal research
(Shepard’s Citations were first published in 1873; Stevens, 2002). The princi-
ple was applied in other fields beginning in the 1960s, first with the Science
Citation Index from the Institute for Scientific Information. Web-based ver-
sions, considering both bibliographic references and hyperlinks, are being
developed (see Chapter 11).
References
American National Standards Institute/National Information Standards Organization.
(1997). ANSI/NISO Z39.14 – 1997: Guidelines for Abstracts. New York: Author.
Ashworth, W. (1967). Abstracting. In Handbook of Special Librarianship and Information
Work (3rd ed.) (pp. 453–481). London: Aslib.
Bar-Hillel, Y. (1960). The present status of automatic translation of languages. Advances in
Computers, 1(1), 92–163.
Barthes, R. (1977). From work to text. In Image, Music, Text (S. Heath, Trans.) (pp. 155–164).
New York: Hill and Wang.
Barthes, R. (1994). The kitchen of meaning. In The semiotic challenge (R. Howard, trans.) (pp.
157–159). Berkeley: University of California Press.
Berthon, P., Pitt, L., Ewing, M., & Carr, C. L. (2002). Potential research space in MIS: A frame-
work for envisioning and evaluating research replication, extension, and generation.
Information Systems Research, 13(4), 416–428.
Bleicher, J. (1980). Contemporary hermeneutics: Hermeneutics as method, philosophy and cri-
tique. London: Routledge & Kegan Paul.
Chomsky, N. (1957). Syntactic structures. The Hague: Mouton.
Chowdhury, G. G. (2004). Introduction to modern information retrieval (2nd ed.). London:
Facet.
Chu, H. (2003, 2010). Information representation and retrieval in the digital age. Medford, NJ:
Information Today, Inc.
Cremmins, E. T. (1996). The art of abstracting (2nd ed.). Arlington, VA: Information Resources
Press.
Ding, C. H. Q. (2005). A probabilistic model for latent semantic indexing. Journal of the
American Society for Information Science and Technology, 56(6), 597–608.
Dumais, S. T. (2004). Latent semantic analysis. Annual Review of Information Science and
Technology, 38, 189–230.
Gadamer, H. G. (1976). Philosophical hermeneutics (D. Linge, Trans. & Ed.). Berkeley:
University of California Press.
Hofstadter, D. R. (1997). Le ton beau de Marot: In praise of the music of language. New York:
Basic Books.
Hutchins, J. (1997). Milestones in machine translation: Episodes from the history of com-
puters and translation. Language Today, 3, 22–23.
International Organization for Standardization. (1976). ISO 214-1976: Documentation:
Abstracts for publications and documentation. Geneva, Switzerland: Author.
Krippendorff, K. (2004). Content analysis: An introduction to its methodology (2nd ed.).
Thousand Oaks, CA: Sage.
Lancaster, F. W. (2003). Indexing and abstracting in theory and practice (3rd ed.). Champaign:
University of Illinois, Graduate School of Library and Information Science.
62 Introduction to Information Science and Technology
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis.
Discourse Processes, 25, 259–284. Retrieved September 27, 2009, from lsa.colorado.edu/
papers/dp1.LSAintro.pdf.
Levy, D. M. (1995). Cataloging in the digital order. Proceedings of Digital Libraries 95.
Retrieved November 11, 2010, from www.csdl.tamu.edu/DL95/papers/levy/levy.html.
Luhn, H. P. (1960). Keyword-in-context index for technical literature. American Documentation,
11(4), 288–295.
McWhorter, J. H. (2001). The power of Babel: A natural history of language. New York: Henry
Holt.
Nicholas, D., Huntington, P., & Jamali, H. R. (2007). The use, users, and role of abstracts in the
digital scholarly environment. Journal of Academic Librarianship, 33(4), 446–453.
Ou, S., Khoo, C. S. G., & Goh, D. H. (2008). Design and development of a concept-based
multi-document summarization system for research abstracts. Journal of Information
Science, 34(3), 308–326.
Rush, J. E., Salvador, R., & Zamora, A. (1964). Automatic abstracting and indexing:
Production of indicative abstracts by application of contextual inference and syntactic
criteria. American Documentation, 22(4), 260–274.
Šauperl, A., Klasinc, J., & Lužar, S. (2008). Components of abstracts: Logical structure of schol-
arly abstracts in pharmacology, sociology, and linguistics and literature. Journal of the
American Society for Information Science and Technology, 59(9), 1420–1432.
Schmandt-Besserat, D. (1992). Before writing. Austin: University of Texas Press.
Stevens, A. M. (2002). Finding, reading, and using the law. Albany, NY: Delmar.
CHAPTER 5
Organization of Information
63
64 Introduction to Information Science and Technology
Authority Control
People’s names and the titles of works (especially music) often appear in var-
ious forms: an author may use a nickname in one place and the full name in
another, add a middle initial, or change his or her last name (e.g., when mar-
ried or divorced); and sometimes two or more authors may have the same
name. To reduce confusion and increase the chances of finding the item
needed, information professionals are responsible for choosing unambigu-
ous forms for names and titles that might be confused. For example, the
author Mark Twain was named Samuel Langhorn Clemens by his parents; he
also published under the name Quintus Curtius Snodgrass. The Library of
Congress established the authorized form of his name as “Twain, Mark,
1835–1910” (the dates of birth and death provide additional clarification).
Information professionals develop and maintain syndetic (from the Greek
words for bind together) structures to identify the preferred term and help the
user find that term or to suggest additional descriptors. For names and titles
of works, the approach is straightforward: The see reference leads the user
from the nonstandard form (Quintus Curtius Snodgrass) to the preferred
term (Twain, Mark, 1835–1910). The possible alternatives are more numer-
ous in the case of subject terms, which are handled through thesauri and lists
of subject headings.
between generic and specific terms. The examples in Figure 5.1 are selected
from the Thesaurus of Psychological Index Terms (2007).
To index a collection, the information professional selects the descriptors
or subject headings from the controlled vocabulary that best represent (see
Chapter 4) the content of each item. The number of descriptors or subject
headings expected or required, the “depth of indexing,” depends on the
nature and intended use of the collection (as well as the resources of the
organization paying for the work).
Information professionals oversee the addition and deletion of terms in
the controlled vocabulary. Because the list does not absorb new terms from
natural language without this support, indexers can often bring new terms or
usages to the attention of those responsible for updating a controlled vocab-
ulary. Monitoring terms used to search the collection also provides guidance
for those who control the controlled vocabulary.
5.3. Classification
Classification involves using alphabetical, numeric, or alphanumeric sys-
tems to represent information and often to allow for sequential linear
arrangement of physical documents. The codes used in classification sys-
tems typically have no inherent meaning, although sporadic attempts have
been made to design codes that include mnemonics (easy to remember by
• antonyms (opposites) and other non-specific relationships: list Related Terms listing potentially
useful descriptors
Physical Disorders
RELATED TERMS
Learning Disorders
Malingering
Figure 5.1. Examples from the Thesaurus of Psychological Index Terms (2007)
Organization of Information 67
1880; in 1897 it became the basis for Herbert Putnam’s Library of Congress
Classification (LCC).
The DDC is used in most school and public libraries in the U.S. and has
been adapted for libraries throughout the world. The DDC divides all knowl-
edge into 10 main classes, numbered zero through nine. Each class is divided
into 10 divisions, and each of these into 10 sections; this produces 100 divi-
sions and 1,000 sections, each identified by a three-digit number (000 to 999).
Further subdivisions are made with decimal notations (Figure 5.2).
Figure 5.2 The Dewey Decimal Classification’s use of decimal numbers for sub-
division
M - Music
MT - Instruction and study
MT 539-654 - Plucked instruments
MT 588 - Guitar (self instruction)
Figure 5.3 The Library of Congress Classification’s use of letters and numbers
for subdivision
Both the DDC and the LCC are enumerative classifications: their struc-
tures attempt to provide classifications for all possible topics. This can intro-
duce problems when knowledge grows beyond the limits of the classification
structure. The DDC’s handling of religion is a good example: the 200s num-
bers (200–299) are for religion; 200–209 are for religion in general, 210s for
natural theology, and 220–289 for Christian religions. This leaves 290–299 for
Organization of Information 69
“other and comparative religions,” which may have reflected U.S. library
holdings in Dewey’s day but causes problems (such as very long numbers) for
modern library collections.
The Universal Decimal Classification (UDC) was adapted from the DDC.
The original work by Belgians Paul Otlet and Henri La Fontaine began in 1895
and was published from 1904 to 1907. Since then, it has been revised and
developed under the guidance of the International Federation for
Documentation and more recently the UDC Consortium (www.udcc.org). In
addition to the Dewey-like hierarchical classification, the UDC uses auxiliary
signs to indicate various special aspects of a subject and relationships
between subjects. This ability to create and link concepts gives the UDC a
faceted element (see next section); for example, maps for mining can be rep-
resented as 622:912—the codes for “mining” and “maps” joined by a colon. It
is used worldwide, often in special libraries.
Faceted Classification
Enumerative classifications encounter three major types of problems:
Poetry Drama
811 American poetry in English 812 American drama
821 English poetry 822 English drama
831 German poetry 832 German drama
841 French poetry 842 French drama
An item on residential care for the elderly is placed first in the “Patient” category and then in a
“Operation” sub-category. In the Social Welfare class (Q) is the classmark. Representing this
compound class is QLV EL:
Q Social welfare
QEL Residential care
QLV Old people
Adding more detail is straightforward. For example, library provision for the elderly in
residential care combines:
Q Social welfare
QEL Residential care
QEP X Library provision
QLV Old people
5.4. Metadata
Metadata is classically defined as “data about data” or “information about
information” (although metadata might be construed as plural, it is used as a
singular collective noun). Metadata can be described somewhat less suc-
cinctly as structured data that describes an information resource. Metadata
can be used for resource description and discovery, the management of
information resources, and their long-term preservation (Day, 2001). The
term originated in the context of electronic documents and has become
increasingly more important with the advent of the World Wide Web and the
explosion of digital information. Metadata makes digital information more
easily describable and findable.
Different types of metadata can be distinguished, although the differences
may be blurry in practice. The following types are common:
For example, consider a DVD set containing Season One of the television
series The X Files. Metadata about the entire set might include “box set of three
DVDs” (structural), “released January 31, 2006,” or “Run time: 1124 minutes”
(descriptive). To describe the first DVD in the set, we could say it was recorded
in NTSC format with an aspect ratio of 1.33:1 (technological). Metadata about
the content of the box set could include “starring David Duchovny and Gillian
Anderson,” “a science fiction show about investigating paranormal events”
(descriptive), as well as “Copyright 20th Century Fox” (legal).
• Separate from the resource. The records in a library catalog are not
directly linked to the items they are about.
Metadata Schemas
Different kinds of information objects require different kinds of metadata: A
scientific article has little in common with a video game. Numerous meta-
data schemas have been developed, each providing specifications that are
geared to specific applications and that describe what kinds of metadata can
or should be provided. A metadata schema should define the following:
• What the precise meaning and intended use of these fields are: The
broad meaning of a field can often be guessed from its name, but
what the Dublin Core element type should include, for example, is
not immediately obvious from its name.
• Which fields are optional and which are required: A Dublin Core
user community creates an application profile and specifies which
elements are required.
Usage
On the internet, specific metadata schemas are commonly defined in XML-
related formats such as Document Type Definition (DTD) or XML Schema.
One of the most popular and simplest schemas is Dublin Core, which defines
15 elements for describing an online resource. Dublin Core metadata is
included in a webpage’s XML or HTML markup.
Library metadata is used for accessing, recording, and archiving items in
the collection. By viewing the record, typically stored in MARC format, we
can learn about the item’s title, author(s), physical size, call number, subject
(through LC Subject Headings), and so on. Digital libraries also use adminis-
trative, technical, and legal metadata (which could include management
information such as file format or copyright) and structural metadata—
which ties together disparate items into a collection.
Museums and archives use the Encoded Archival Description (EAD) meta-
data schema to encode archival finding aids using an XML DTD. EAD’s stan-
dardization makes it possible for users to search several archival collections
at the same time.
Digital archives use preservation metadata to support the long-term
preservation of digital objects. The PREMIS (PREservation Metadata:
Implementation Strategies) Data Dictionary for Preservation Metadata
defines the information needed to preserve digital objects for long periods of
time. The Metadata Encoding and Transmission Standard packages preser-
vation metadata into a standard XML container and supports the exchange
of digital objects among institutions.
Metadata has a multitude of uses in information technology and business.
For example, business intelligence applications allow businesses to examine
the various dimensions of information about their sales trends and market
competition in order to improve performance. This metadata can be stored
and manipulated in spreadsheets, relational databases, or online analytical
processing databases.
Metadata schemas exist for describing specific types of files. Adobe’s
Extensible Metadata Platform is one example of a structured metadata
schema for describing digital images. This metadata can be embedded into
image file formats such as TIFF, EXIF, and RAW.
74 Introduction to Information Science and Technology
have already embraced the FRBR conceptual model in designing their future
systems” (p. 7).
Tagging
The examples described here are all structured metadata, which means they
have a set of rules and guidelines that must be followed. With the advent of
Web 2.0, however, internet users can “tag” information objects in an unstruc-
tured manner, with no restrictions on the format of their descriptions.
Individual tags collectively comprise a folksonomy, a user-created taxonomy
of descriptive terms. Examples of popular websites using tags and folks-
onomies include Flickr for photograph sharing, Delicious for social book-
marking, and the online retailer Amazon.com.
Here are some sources for further information:
• Findability: How easy is it for users to locate what they need on the
site?
website prototypes, or usability tests (see Chapter 9). IA developers are con-
cerned primarily with whether users like and can successfully use the classi-
fications, labels, and other descriptions that have been attributed to the site’s
content.
References
Bates, M. (1989). The design of browsing and berrypicking techniques for the online search
interface. Retrieved November 11, 2010, from www.gseis.ucla.edu/faculty/bates/berry
picking.html.
Belkin, N. J. (1980). Anomalous states of knowledge as a basis for information retrieval.
Canadian Journal of Information Science, 5, 133–143.
Bowker, G. C. & Star, S. L. (2000). Sorting things out: Classification and its consequences.
Cambridge, MA: MIT Press.
Day, M. (2001). Metadata in a nutshell. Information Europe, 6(2), 11. Retrieved November 11,
2011, from www.ukoln.ac.uk/metadata/publications/nutshell.
Dilevko, J., & Dali, K. (2004). Improving collection development and reference services for
interdisciplinary fields through analysis of citation patterns: An example using tourism
studies. College & Research Libraries, 65(3), 216–241.
Eastwood, T. (1994). What is archival theory and why is it important? Archivaria, 37, 122–130.
International Federation of Library Associations and Institutions. (1997). Functional require-
ments for bibliographic records. Retrieved November 11, 2010, from www.ifla.org/files/
cataloguing/frbr/frbr_2008.pdf.
Lakoff, G. (1987). Women, fire, and dangerous things: What categories reveal about the mind.
Chicago: University of Chicago Press.
Library of Congress. (2007). MARC code list for languages: Introduction. Washington, DC:
Library of Congress Network Development and MARC Standards Office. Retrieved
November 11, 2010, from www.loc.gov/marc/languages/introduction.pdf.
Miller, G. A. (1956). The magical number seven plus or minus two: Some limits on our capac-
ity for processing information. Psychological Review, 63(2), 81–97.
Ranganathan, S. R. (1967). Prolegomena to library classification. New York: Asia Publishing
House.
Rosch, E. H. (1973). Natural categories. Cognitive Psychology, 4(3), 328–350.
Rosenfeld, L., & Morville, P. (2008). Information architecture for the World Wide Web (3rd ed.).
Sebastopol, CA: O’Reilly.
Thesaurus of psychological index terms (2007). (11th ed.; L. G. Tuleya, ed.) Washington, DC :
American Psychological Association.
Tillett, B. (2004). What is FRBR: A conceptual model for the bibliographic universe.
Washington, DC: Library of Congress. Retrieved November 11, 2010, from www.loc.gov/
cds/downloads/FRBR.PDF.
CHAPTER 6
79
80 Introduction to Information Science and Technology
Hardware Basics
Hardware is the foundation to computing. Its four primary functions are
Processing
The motherboard is a printed circuit board with chips, connectors, and a
power supply on it. Key components of a computer system come together,
exchange data, and process information; the motherboard functions as the
hub of all data exchange in a computer system.
The central processing unit (CPU), also called the chip, is described as the
brain of a computer. The CPU performs mathematical and logical operations
and also manages and moves information (instructions and data) as directed
by the user or software. CPU speeds are measured by the number of com-
pleted instruction cycles per second; speeds can range from 600 megahertz
(MHz; million cycles per second) to more than 4 gigahertz (GHz; billion
cycles per second).
The basic input/output system (BIOS) chip contains the boot firmware,
which controls the computer until the operating system loads. Firmware is
software written on a read-only chip; the BIOS is powered by a small battery,
so it functions even when the computer is without external electric power.
The BIOS also contains information about the devices attached to the sys-
tem, including the drives, external buses, video card, sound card, keyboard,
mouse, and printer. The BIOS stores system configuration information as
well as the date and time on the CMOS (Complementary Metal Oxide
Semiconductor), a nonvolatile memory chip.
Computers and Networks 81
• The printer is an output device that accepts text and graphic data
from a computer and reproduces it as ink on paper, labels,
envelopes, or other media.
Computers and Networks 83
Software Basics
The instructions or programs for a computer are its software. These instruc-
tions control what the hardware does and when (in what sequence); the
speed depends on the computer’s internal clock, which is a hardware com-
ponent and may be limited by its memory or storage capacity. It is customary
to divide software into three types: system software, programming software,
and application software.
System Software
System software includes two elements:
• The operating system (OS) is the link between the hardware and the
software; the basic element of the OS is the kernel, which
coordinates demands for CPU, memory, and I/O devices. Unix,
Apple OS, and Windows are examples of OSs.
• Device drivers are also part of the system software; they provide
instructions for how the computer communicates with a specific
peripheral device such as a printer or mouse.
Programming Software
Programming software supports people writing computer programs. It com-
prises three types:
Application Software
Application software allows people to use computer hardware for sophisti-
cated applications without having to learn or write the very specialized
instructions that would be required at the OS or programming levels. Word
processing, database management, graphics, and web browsing are examples.
84 Introduction to Information Science and Technology
Network Components
A computer network is composed of hardware and software that enable mul-
tiple users and computer systems to share data, software, and hardware
resources. These resources can include printers, databases, multimedia, files,
and webpages. Computer networks support communication such as email,
instant messaging, collaborative applications, and newsgroups. Figure 6.1
shows the setup for a library LAN (Berkowitz, 2007).
The components of a network are
• The application layer (Layer 7) interacts with the human user; this
layer identifies the partner(s) in the communication, establishes
resource availability, and synchronizes communication. Examples
include hypertext transfer protocol (HTTP) and file transfer
protocol.
• The data link layer (Layer 2) handles the procedural and functional
means for data transfer between network components; it may
detect and sometimes correct errors that occur at the physical
layer. Address resolution protocol is an example.
The OSI reference model is an abstraction that helps account for the vari-
ous kinds of communication needed to make a computer network function.
Different protocols work at one or more layers of the model. Each network
component functions on all seven layers, with each layer relying on those
below it and communicating with its “opposite number” from the component
with which it is interacting. In this way a personal computer can use the HTTP
(application layer) protocol to communicate with a search engine’s computer
even if each uses entirely different protocols for the lower six layers.
Networking Standards
Many highly detailed standards are required for the components of computer
networks to communicate effectively. The examples listed with the OSI refer-
ence model provide only a small sense of the nature of the standards needed.
The Computer and Communication website (www.cmpcmm.com/cc/
standards.html) provides an impressive list of networking-related standards.
To be effective, standards should be as easy as possible to implement and
should apply in as many cases as possible. Various organizations exist to
develop, maintain, modify, and in some cases enforce networking standards.
Some are international (such as the International Organization for
Standardization, which maintains the OSI reference model), and others focus
on specific countries or regions. Underwriters Laboratories, for example,
develops standards for electrical and electronic equipment in North
America. Other important organizations that are involved in standards for
networking:
The Internet
The internet is a network of computer networks. It connects computers
around the world using a standard, the Internet Protocol Suite (TCP/IP, work-
ing at OSI reference model layers 4, transport, and 3, network). The World
Computers and Networks 89
Cloud Computing
Second-generation computers (1956–1963; the first to use solid-state tech-
nology) provided the stereotype of the mainframe computer: a large, central
machine to which programmers submitted their stacks of punched cards
with program instructions. The third generation (1964–1971; with integrated
circuits) concentrated even more processing power and speed in the central
computer. Telecommunication networks of the day allowed remote users to
interact with the mainframe; they used “dumb terminals” (for data entry and
display, but with no processing capacity). The fourth generation (beginning
in 1971; using microprocessors) has supported gradual migration of com-
puter processing power from the central mainframe to smaller computers:
minicomputers, personal (micro-) computers, and now a variety of hand-
held devices. This dispersion of computing power is sometimes called dis-
tributed computing; today’s embedded microprocessors make possible
ubiquitous computing (or ubicomp, also called pervasive computing), in
which computing power is present in everyday objects (for example, a cell
phone’s “awareness” of its geographic location).
90 Introduction to Information Science and Technology
Early versions of cloud computing began around 2001; the concept gained
traction with Amazon, Google, and IBM involvement beginning in 2005.
Google’s proposed Chrome OS is intended to support cloud computing,
allowing any personal or hand-held computer to go through the web to use
software and store data on Google’s computers. Cloud computing almost
looks like a return to the centralized, mainframe model; users access appli-
cations and storage as needed over the internet (which is often depicted as a
cloud in network diagrams; Cleveland, 2010).
Cloud computing companies provide and manage data storage and soft-
ware applications such as word processing, web-based email, database man-
agement, and inventory control. The user pays for the amount of service
used, avoiding the expenses of purchasing and maintaining the hardware
and software and having access from anywhere the internet is available.
Critics complain that the user’s applications and data are “hostage” to the
organization hosting the cloud and that lack of choice about applications
software limits the creativity of people who would design new applications
(Zittrain, 2009). However, many individuals and organizations appreciate the
convenience of ubiquitous access and the ability to treat computing power as
a utility rather than a capital investment.
• Back doors are easy ways for programmers to gain access to test
and debug (correct mistakes in) computer programs. Back doors
may remain after a program becomes operational, or a virus
(software code intended to damage a computer or the information
it stores) may create its own back door to allow a remote computer
access to gather information such as passwords or account
numbers.
Security Basics
The first line of defense against these challenges is a user base that is edu-
cated and aware of security issues. Precautions need not be onerous. Users
should lock or log off a computer when it is not in use. Passwords and pass-
codes are also important security devices; they should not be shared (except,
as the saying goes, “with someone with whom you would share a tooth-
brush”) and should be changed frequently. Easily remembered words (such
as a name, identification number, or even the word password) are easy to
guess, but a long string of arbitrary characters is difficult to remember and
forces people to keep a written copy near the computer—not a very secure
approach. Computer systems are designed to provide security in various
ways. Encryption, firewalls, and virtual private networks are examples.
Encryption
One way to secure data from use by unintended recipients is to encrypt the
data: to transform it so that it cannot be read unless the reader has the decod-
ing key, or cipher. Encryption is especially important if confidential data will be
transmitted over unprotected communications systems such as the internet.
Substitution and transposition are the simplest forms of encryption. In
substitution, in use since the time of Julius Caesar, each letter in a message is
replaced by a different letter. Caesar chose the letter three letters further
along in the alphabet, so D = G, I = L, N = Q, E = H, and the word dine is
92 Introduction to Information Science and Technology
encoded as glqh. With a transposition cipher, the message can be written left
to right in lines of five characters, and then rewritten for transmission read-
ing each column top to bottom.
Computers make much more sophisticated encryption methods possible.
The RSA algorithm is the most commonly used encryption and authentica-
tion algorithm. Named for its inventors— Rivest, Shamir, and Adleman—it is
often used to provide security in web browsers and as the basis for digital sig-
natures used to authenticate entities on the internet. RSA involves a public
key and a private key. The public key is available to anyone and is used to
encrypt messages. Only the private key is able to decrypt messages that have
been encrypted this way. The keys are generated using two large, randomly
selected prime numbers. The message recipient performs the calculations
and makes the public key available. Because only the recipient knows the pri-
vate key, and because of the amount of work required to compute the private
key, it is extremely difficult for anyone else to decode the message.
Firewalls
A network firewall is hardware or software that separates a computer(s) from
a network (often the internet). Network-level firewalls typically operate at the
OSI reference model layer 3 (the network layer); they check the protocol and
destination of packets intended for the network and deny access by dropping
(deleting) packets that do not meet specified criteria. Although network-level
filtering is a good first line of defense, more sophisticated measures, at higher
OSI layers, may also be adopted. Application-level firewalls can use a proxy
server to send and receive packets, hiding the user’s internet address and
allowing time to assess the content and protocols being used, as well as the
address of the remote source.
6.4. Conclusion
Understanding how computers work requires knowledge of the physical
structures and capabilities of the hardware, as well as how these capabilities
are organized and controlled through software. Effective computer manage-
ment also requires the ability to understand and support users’ activities
while at the same time anticipating potential threats to the system.
Computing and communication technologies continue to evolve, which
means that the system manager must also find ways to keep up with
advances that improve the capability, reliability, and accessibility of these
important tools for information management.
References
Barrett, D., & King, T. (2005). Computer networking illuminated. Sudbury, MA: Jones and
Bartlett.
Berkowitz, H. C. (2007). Representative academic library LAN with external access. Retrieved
November 11, 2010, from upload.wikimedia.org/wikipedia/commons/2/27/NETWORK-
Library-LAN.png.
Cleveland, D. (2010). Cloud computing. Wikinvest. Retrieved November 11, 2010, from
www.wikinvest.com/concept/Cloud_Computing.
Computer History Museum. (2006). Exhibits: Internet history. Retrieved April 20, 2010, from
www.computerhistory.org/internet_history.
DUX Computer Digest. (2010). What is the difference between an Ethernet hub and switch?
Retrieved November 11, 2010, from www.duxcw.com/faq/network/hubsw.htm.
McDonald, C. (2010). Virtual private networks: An overview. Intranet Journal. Retrieved
November 11, 2010, from www.intranetjournal.com/foundation/vpn-1.shtml.
Walters, E. G. (2001). The essential guide to computing: The story of information technology.
Upper Saddle River, NJ: Prentice Hall.
Zittrain, W. (2009, July 19). Lost in the cloud. New York Times. Retrieved April 20, 2010, from
www.nytimes.com/2009/07/20/opinion/20zittrain.html.
CHAPTER 7
7.1. Introduction
Consider the two concepts information and system separately. As noted in
Chapter 1, information can be anything that might inform somebody of
something. A system can be understood as a group of independent but inter-
related elements constituting a unified whole. To consider something a sys-
tem is to consider it in a particular way that emphasizes the interrelations of
the elements, working together to fulfill an overall purpose. An information
system is thus not a system processing information but rather a system
intended to inform somebody about something: an informing system.
Information systems are usually computer-based, although this is not
essential. For example, the scientific communication system, which is based
on conferences, journals, and libraries, may be understood as an information
system, whether it employs printed or digital media.
More commonly, an information system is “a computer hardware and
software system designed to accept, store, manipulate, and analyze data and
to report results, usually on a regular, ongoing basis. An [information system]
usually consists of a data input subsystem, a data storage and retrieval sub-
system, a data analysis and manipulation subsystem, and a reporting sub-
system” (Reitz, 2007).
Think back to your first cell phone directory, list of web links, collection of
digital music, or email address book. How did you collect and organize the
contents? Most such systems begin with a few easily recalled entries. As the
number grows, the system can become unwieldy, and it becomes necessary
to invest time to reorganize haphazard collections of information so that
needed items can be found and used in a more efficient way.
In a structured information system, information is analyzed and organ-
ized into component parts, which in turn may have components, which are
also organized. These are data-oriented systems, designed to facilitate data
storage and retrieval.
Structured data is the “stuff” of most contemporary computer-based
information systems, but it is important to realize the other, equally impor-
tant aspect of information science: the social and cultural network in which
all information systems are embedded. The technology that powers every
information system is structured and designed with specific users in mind,
and these users are the ultimate judges of the success or failure of the system.
95
96 Introduction to Information Science and Technology
This chapter examines how the elements that make up information sys-
tems—the design, structure, technical aspects—come together in an interac-
tive, fundamentally social space to give users access to information that will
serve their diverse information needs. Chapter 8, on information system
applications, builds on these concepts and examines how they have been
adapted to support capture, storage, and retrieval of more complex types of
data. Chapter 9 considers users’ perspectives on information systems.
We begin by investigating how careful analysis of information needs and
resources lays the groundwork for a functional and maintainable informa-
tion system. This initial design is the basis for the next step, conceptual mod-
eling of the system components. We then consider the relational data model,
the basis for almost all modern database systems.
7.3. Design
Traditional approaches to design, such as the system development life cycle,
tend to follow a waterfall model, with results from one step forming the basis
for the next. The steps in this process define a rigid, linear sequence of events
that, if successful, result in the completion of the information system.
Critics of the waterfall model contend that system design should be like a
spiral, with feedback from each step informing design and improvement
throughout the process. Newer versions are explicitly iterative, meaning that
the system is never truly finished, but rather that each completed model is
always awaiting evaluation and redesign.
98 Introduction to Information Science and Technology
ideas about the world—how it might be organized and how it might work.
Models describe relationships: parts that make up wholes; structures that
bind them; and how parts behave in relation to one another” (Dubberly,
2009, p. 54). A conceptual data model names the items of significance for the
information system and characteristics (attributes) of and associations (rela-
tionships) among those items.
The most widely accepted standard for modeling systems is the Unified
Modeling Language (UML), “a non-proprietary, third generation modeling
language. The UML is an open method used to specify, visualize, construct,
and document the artifacts of an object-oriented software-intensive system
under development. The UML represents a compilation of ‘best engineering
practices’ which have proven successful in modeling large, complex systems”
(Unified modeling language, 2010).
Designers employ various models to develop effective database systems:
• User interface models establish how users will see and interact with
the data. The Unified Modeling Language for Interactive
Applications integrates user modeling facilities into UML. It can be
used to represent the user interface (Stephens, 2009).
well, semantic object models do not depict all the possible details
of an information system (Stephens, 2009).
7.5. Databases
A database is a collection of data that is stored to facilitate addition, updates,
deletion, and access. A database is often implemented using specific software
called a database management system.
Database Functionality
CRUD is the mnemonic acronym that stands for the four fundamental oper-
ations any database should support: Users should be able to create, read,
update, and delete information contained in the database (Stephens, 2009).
To accomplish these tasks, the system must support retrieval: users should
be able to locate efficiently every piece of data contained in the database.
Databases must be accurate and consistent. Therefore, the system
emphasizes the validity of the content: whenever a record is created,
updated, or deleted, the database checks or verifies that the information is of
the correct type, represents permissible values, and is in an acceptable state.
In addition, good databases should provide easy error correction, efficient
operation, extensibility as data and number of users increase, and security. If
many users will need access to the same data from multiple locations, data
sharing might also be important.
Data Models
“A data model is a framework for describing the structure of databases. A
schema is the structure of a particular database” (Sciore, 2009, p. 7). Of the
many data models that have been developed, four are in common use today:
• Flat files contain text only, although text may include characters
(such as commas or tabs) that can provide some structure to data.
eXtensible Markup Language (XML) files are an example.
• Spreadsheets also store data in tabular form and allow for data
manipulation, but do not allow for complex relationships among
data elements.
semesters. Having a key for each item is also essential for normalization, the
process of removing redundant information from a database.
Information in a database is often split into multiple tables to help with
the organization of resources, prevent individual tables from becoming too
large, improve processing efficiency, and ensure the security of records. For
example, the data regarding students’ grades might be recorded as in Table
7.5. Only the essential data appears in Table 7.5, making it more efficient and
more secure.
Now we begin to see how records in a relational database are “related” to
each other—if records in one table contain primary keys for records in
another table, these “connected” fields in the first table can serve as foreign
keys leading us to the corresponding data in the second table. This allows the
database to retrieve data from more than one table in response to a query.
Consider Table 7.6, containing information about the professors who
taught the courses. We can use this with our other tables to retrieve informa-
tion from the relational database.
the attribute “Course ID.” Table 7.5 and Table 7.6 also both contain the attri-
bute “Course ID.” Figure 7.3 shows how a join on all the tables would work.
Table 7.7 shows the complete listing of students, courses, professors, and
grades assembled from the relationships in all these tables.
Structured Information Systems 105
References
Anthes, G. (2010). Happy birthday, RDBMS! Communications of the ACM, 53(5), 16–17.
Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of
the ACM, 13(6), 377–387.
109
110 Introduction to Information Science and Technology
constant throughout the search. Weights relative to this threshold value are
assigned to each term in the query. If the weight assigned to a term is equal
to or greater than the threshold value, documents indexed under that term
will be retrieved. If the weight assigned to the term is less than the threshold
value, the documents indexed under the term will not be retrieved unless the
term appears in combination with other terms and the sum of the terms’
weights is equal to or greater than the threshold value.
Negative threshold values have their uses, too. To retrieve documents
indexed under term A or term B but not both, a negative threshold value is
selected, and the weights of terms A and B can be set equal to it. If either of
the desired terms appears by itself, then the document will be retrieved
because the value of the term will equal the threshold value. However, if both
terms appear, the document will not be retrieved because their combined
weight will be less than the threshold value. This is the weighted-term equiv-
alent of the Boolean exclusive OR.
The sum of the weights of the search terms in a document can be thought
of as a kind of document weight. Sorting the retrieval set by document
weights makes it possible to display the items retrieved in decreasing order of
their probable relevance. Thus, if several hundred documents are likely to
satisfy a given search request, the searcher will see the most promising doc-
uments first, rather than a random sample.
where ni,j is the number of times the term occurs in document dj, which is
divided by the sum of the occurrences of all terms in the document.
The inverse document frequency measures the general importance of the
term in the database. We divide the number of all documents by the number
of documents containing the term and taking the logarithm of that quotient:
Here
With the use of tf × idf, a query can retrieve documents ranked not by
whether a term occurs but by how much weight that term has in each docu-
ment. Enhancements and extensions to tf × idf include calculating frequen-
cies for multiword terms, identifying synonyms in the database, and using
feedback from the searcher to improve search results.
searcher can counteract these problems by cutting back on the use of the
offending type of operator or by introducing an operator having the opposite
effect. Before changing the terms in the query, examine how the existing
terms have been coordinated.
Retrieving nothing (the null set) does not necessarily indicate a bad
search. For example, an inventor wishing to establish the novelty of a new
device would be delighted to find that there are no patents for similar
devices. Similarly, doctoral students would be glad to learn that no disserta-
tions have been written on their chosen topics. Achieving search results that
are 100 percent relevant should suggest that something may have been
missed because the search strategy was too narrow.
6. The need for more complex, robust models for system design and
evaluation: Cognitive science provides a useful basis, but the
activities involved in relevance assessment are complex (see
Chapter 3, section on information seeking).
Much of the research and development for web search engines such as
Google is proprietary. However, one can infer that a blend of techniques has
been employed, including assessing the number and quality of links to a
given website or page (Brin & Page, 1998). Using such links is similar to track-
ing citations in the print literature to establish connections among journal
articles (see Chapter 11, sections on bibliometrics and webometrics).
Current academic research on information retrieval is best exemplified by
the Text REtrieval Conference (TREC; trec.nist.gov), sponsored by the U.S.
National Institute of Standards and Technology (NIST) and the U.S.
Department of Defense. TREC prepares large data sets in specific areas of
focus, called tracks. The tracks represent real-world retrieval problems in
areas such as blogs, web searching, and chemical information retrieval.
Participants prepare their information retrieval systems to use the NIST data
and answer questions NIST selects. The results from each information
retrieval system are compared and discussed to provide a basis for improving
retrieval technologies and techniques.
116 Introduction to Information Science and Technology
• Selection
• Trial setup
• Selection approval
• Licensing
• Ordering
• Billing
• Cataloging
• Access activation
• Troubleshooting
These functions form the ERM life cycle, from selection, through access
activation and monitoring, to renewal or cancellation. The life cycle can be
seen as a progression, with functions close to each other performed by a sin-
gle administrator or electronic resources librarian. All of these ERM functions
can be part of an electronic resources librarian’s job description; overlap
between the clusters of functions performed by electronic resource librarians
is common.
Electronic resources may be acquired as one-time purchases, through a
subscription, or as a combination of subscription and one-time purchase
(for example, recent volumes of an electronic journal acquired as a subscrip-
tion, and archive or backfile volumes acquired as a one-time purchase).
Some ebooks appear as monographs—one-time publications, complete
as issued. Databases and electronic journals are generally handled as contin-
uing publications, with no planned date to cease publication. The model for
other resources is unclear: An online encyclopedia is monographic when
handled as a one-time purchase but has continuing access when purchased
on the subscription model (e.g., print and online purchase with online access
maintained only if the institution continues to purchase the annual print
update). Book series (e.g., Springer’s Lecture Notes in Computer Science) can
be monographic, when each constituent book has a separate access point, or
a continuing publication, when made available as a series.
ERM is both an extension of traditional monographic and serials acquisi-
tions processes and a new area for library management. New responsibilities
include negotiating licensing agreements, supporting electronic access, and
monitoring usage.
Licensing
Farb (2006) quotes the director of the University of Wisconsin–Madison
library as saying, “Libraries are agreeing to licenses that provide no guaran-
tee of continued access to the content if the subscription ends. What this
means, of course, is that universities are only renting this information.” When
libraries purchase printed material, they own the items purchased. With elec-
tronic information sources, however, the library typically leases access for a
specified period; even if a database is supplied to the institution and
mounted on its computer, the content usually belongs to the database
provider and must be returned when the contract expires. When the database
Information System Applications 119
resides on the provider’s computer and library users have only the right of
access, the library has even less control over the information. As Farb
observes, such arrangements have major implications for libraries’ tradi-
tional responsibilities for preserving information.
Libraries and publishers are exploring ways to preserve born-digital
scholarly publications. Portico (www.portico.org) and LOCKSS (Lots of
Copies Keeps Stuff Safe; www.lockss.org) make arrangements to store
archival copies of electronic publications, assuring that they will be available
for future scholars.
Both librarians and publishers are investing considerably more time in
negotiating and maintaining license agreements than they did in the print
era. Libraries may be bound by state law or institutional practice to require
special wording or exceptions to publishers’ standard contracts. Hahn (2007)
describes the Shared Electronic Resource Understanding project as an alter-
native to licensing. It “expresses commonly shared understandings of the
content provider, the subscribing institution and authorized users; the
nature of the content; use of materials and inappropriate uses; privacy and
confidentiality; online performance and service provision; and archiving and
perpetual access” (Hahn, 2007, paragraph 1).
Monitoring Usage
Many license agreements limit the number of simultaneous users or the
amount of material that may be downloaded (copied to the user’s computer).
Farb (2006) notes that the established U.S. limit on copyright through the fair
use exemption is absent in many standard license agreements. These agree-
ments may also interfere with “the right of every individual to both seek and
receive information from all points of view without restriction,” as the
American Library Association (2009) defines intellectual freedom.
Information professionals may feel awkward about being placed in the posi-
tion of overseeing who uses resources, how much, and for what purposes.
Although monitoring is typically done automatically, the information profes-
sional will be called on to resolve cases in which the vendor alleges that
someone has misused a resource.
Aggregated usage statistics showing, for example, how many sessions
were conducted with various vendors in the past year can help electronic
resources managers decide whether licenses for more (or fewer) simultane-
ous users should be purchased, which agreements to renew or cancel, and
which resources to promote, among other things.
As libraries transition from a mainly print to an online environment,
new methods, tools, and access options continue to be developed. ERM is
approached differently from one institution to another. A single electronic
resources librarian or a group coordinated by such a librarian is common;
libraries allocate ERM responsibilities in various ways, and some functions
may be handled by other departments. In the library’s organizational struc-
ture, the electronic resources librarian may be part of the public services or
acquisitions department; in some cases ERM is handled collaboratively by
various library departments, with no designated electronic resources
librarian.
Figure 8.4 Early graphical interface (Xerox Star, circa 1979). Note the use of
icons, folders for directories and multiple windows; these
information visualization metaphors are now ubiquitous on
computing platforms. (Used with permission of PARC,
www.parc.com)
Figure 8.5 Display of the words in this chapter, created by Wordle (www.wordle.
net). Font size indicates frequency of use.
Information System Applications 127
References
Abrams, J. (1994). Muriel Cooper’s visible wisdom. AIGA, The Professional Association for
Design. Retrieved November 11, 2010, from www.aiga.org/medalist-murielcooper.
Adams, R. M., Stancampiano, B., McKenna, M., & Small, D. (2002, October). Case study: A vir-
tual environment for genomic data visualization. Paper presented at the 13th IEEE
Visualization Conference, Boston, MA.
American Library Association. (2009). Intellectual freedom and censorship Q & A. Retrieved
November 11, 2010, from www.ala.org/ala/aboutala/offices/oif/basics/ifcensorship
qanda.cfm.
Arnheim, R. (2004). Visual thinking. Berkeley: University of California Press.
Bertin, J. (1983). Semiology of graphics (W. J. Berg, Trans.). Madison: University of Wisconsin
Press.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine.
Seventh International World-Wide Web Conference. Retrieved November 11, 2010, from
ilpubs.stanford.edu:8090/361.
Card, S. K., Mackinlay, J. D., & Shneiderman, B. (1999). Readings in information visualization:
Using vision to think. San Francisco: Morgan Kaufman.
Cleverdon, C., & Keen, M. (1966). Factors determining the performance of indexing systems:
Test results. Cranfield, UK: Aslib Cranfield Research Project.
Davis, C. H., & McKim, G. W. (1999). Systematic weighting and ranking: Cutting the Gordian
knot. Journal of the American Society for Information Science, 50(7), 626–628.
Davis, C. H., & Rush, J. E. (1979). Guide to information science. Westport, CT: Greenwood.
Farb, S. (2006). Libraries, licensing, and the challenge of stewardship. FirstMonday, 11(7).
Retrieved November 11, 2010, from firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/
fm/article/view/1364/1283.
128 Introduction to Information Science and Technology
Froehlich, T. J. (1994). Relevance reconsidered: Towards an agenda for the 21st century:
Introduction to special topic issue on relevance research. Journal of the American Society
for Information Science, 45(3), 124–134.
Gombrich, E. (1995). The story of art (16th ed.). New York: Phaidon.
Hahn, K. L. (2007). SERU (Shared Electronic Resource Understanding): Opening up new pos-
sibilities for electronic resource transactions. D-Lib Magazine, 13(11/12). Retrieved
November 11, 2010, from dlib.org/dlib/november07/hahn/11hahn.html.
Korfhage, R. (1997). Bibliography on first phase information visualization. Retrieved
November 11, 2010, from www.asis.org/SIG/SIGVIS/hostedMaterial/bibliography.pdf.
Lynch, C. (2005). Where do we go from here? The next decade for digital libraries. D-Lib
Magazine, 11(7/8). Retrieved November 11, 2010, from www.dlib.org/dlib/july05/lynch/
07lynch.html.
Maeda, J. (2001). Design by numbers. Boston: MIT Press.
Massachusetts Institute of Technology Architecture Machine Group. (1981). Interactive
movie map. Retrieved July 25, 2009, from www.naimark.net/projects/aspen/aspen_
v1.html.
Panofsky, E. (1972). Studies in iconology: Humanistic themes in the art of the Renaissance.
New York: Harper & Row.
Saracevic, T. (2007). Relevance: A review of the literature and a framework for thinking on the
notion in information science. Part II: nature and manifestations of relevance. Journal of
the American Society for Information Science and Technology, 58(3), 1915–1933.
Shneiderman, B. (2003). Craft of information visualization: Readings and reflections. New
York: Morgan Kaufmann.
Tufte, E. (1983). The visual display of quantitative information (2nd ed.). Cheshire, CT:
Graphics Press.
Tufte, E. R. (1990). Envisioning information. Cheshire, CT: Graphics Press.
Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
Vickery, B. C. (1959). Subject analysis for information retrieval. Proceedings of the
International Conference on Scientific Information (Vol. 2, pp. 855–866). Washington, DC:
National Academy of Sciences. Retrieved November 11, 2010, from www.nap.edu/open
book.php?isbn=NI000518&page=855.
Wilson, P. (1973). Situational relevance. Information Storage & Retrieval, 9(8), 457–471.
CHAPTER 9
Evaluation of
Information Systems
Self-Studies
In a self-study, designers and developers inspect a system to evaluate
whether it satisfies system requirements, best practice standards, or both.
Techniques include walkthrough inspections, system simulations, and
129
130 Introduction to Information Science and Technology
Laboratory Studies
Laboratory studies can help generate insights into a system’s usability and
effectiveness (e.g., Wixon & Wilson, 1997), participants’ attitudes toward the
system (e.g., Sonnenwald et al., 2003), or potential system impact on task or
work processes and performance (e.g., Söderholm et al., 2008). Thus, these
studies can support both summative evaluation, providing feedback on a
current design, and formative evaluation, providing feedback on possible
improvements for future versions of a system. The latter is particularly useful
in an iterative design-evaluation approach. Evaluation data collected in lab-
oratory studies may include
Field Studies
A field study is a semi-structured period of observation of users in their nat-
ural environment. Participants are observed using the system in the context
of their everyday lives or work. Examples of field studies include work by Xie
(2008) and Walsh, Kucker, Maloney, and Gabbay (2000). These studies inves-
tigate patterns of adoption and adaptation as well as nonadoption of sys-
tems, the relationships among technologies, and task and work processes
and outcomes. Evaluation data collected in field studies may include
• Computer logs: Typically, computer logs record the exact time and
content of each user entry and system response; for web use, logs
can show each website visited.
Usability Perspective
In summary, the design of an evaluation should take into account the type
of information system to be evaluated, the purposes of the evaluation, the
Evaluation of Information Systems 133
context in which the system will be used, the specific tasks or work processes
it will support, and the resources (including time, money, and expertise)
available to conduct the evaluation. There will always be trade-offs and com-
promises when designing an evaluation. Each evaluation approach has
strengths and weaknesses; the challenge is to determine which approach, or
combination of approaches, is best considering the purposes, context, and
resources available for the evaluation.
Venda and Venda (1995) analyzed ergonomics studies to develop the law
of mutual adaptation, which postulates that system users will perform best
when the computer system’s capabilities match the cognitive skill structures
and behavior strategies of the human user. Efficiency gains are subject to
diminishing returns because, as a user develops more advanced cognitive
skill structures, he or she can find additional strategies to perform the same
task. Through its interface, a “mutually adaptive” system can support these
new skills and strategies, thus increasing the user’s efficiency in performing
the task (Carey, 1997, p. 5).
Studies of users (or people similar to intended users) and experience in
adapting systems to fit people (instead of the other way around) encouraged
system designers to
• Focus on user(s) and task(s) early in system design: Who will use
the system, how often will they use it, and what tasks will they do
most often? (A data entry system to be used by experts 8 hours a
day should support quick access to common tasks. The infrequent
user posing an occasional query to the same system will need
more support from the interface; speed of access can be sacrificed
for a more “chatty” interaction.)
Evaluation of Information Systems 135
A good design can make an interface easier to learn and faster to use, can
reduce errors, and can increase the users’ sense of satisfaction. Wickens, Lee,
Liu, and Gordon Becker (2004) derived 13 design principles, which they
grouped according to a focus on perception, mental models, attention, or
memory:
Perceptual principles:
10. Use multiple senses. A user can more easily process information
across different resources. For example, visual and auditory
information can be presented simultaneously rather than
presenting only visual or only auditory information.
Memory principles:
12. Provide predictive aids. Proactive actions are usually more effective
than reactions. A display should attempt to eliminate
resource-demanding cognitive tasks and replace them with
simpler perceptual tasks in order to reduce the use of the user’s
mental resources. Doing so will allow the user not only to focus on
current conditions but also to think about possible future
conditions. An example of a predictive aid is a road sign displaying
the distance from a certain destination.
13. Provide consistency. Old habits from other displays will easily
transfer to support processing of new displays if the displays are
designed in a consistent manner. A user’s long-term memory will
trigger actions that the user expects to be appropriate. A design
must accept this fact and use consistency among different
displays.
Usability
It is important to understand usability in order to enhance the functionality
and the acceptance of information systems. The International Organization
for Standardization (1994) holds that usability is “the extent to which a prod-
uct can be used by specified users to achieve specified goals with effective-
ness, efficiency, and satisfaction in a specified context of use” (p. 10).
Jakob Nielsen (1993), a well-known usability expert, identified five attri-
butes of usability:
• Learnability
• Efficiency
• Memorability
• Handling of errors
• Satisfaction
4. The end of the ephemeral with the growth of each person’s “digital
footprint,” consisting of information about where we are and what
we purchase that would formerly have been discarded
Sellen and colleagues (2009) would redefine HCI. Humans are not just
users of computers; as consumers, creators, and producers, they value aes-
thetic and emotional aspects of their interactions with technology.
Computers today are digital technologies embedded in our world; comput-
ers also rely on this embedded infrastructure, so the “C” in HCI must expand
to comprehend network connections as well. Finally, the interaction may be
within a person’s body, between bodies, between a body and an object (not
just by typing or mousing), or among many bodies and objects—for example,
in a public space. Sellen and colleagues concluded that “the conception of
technology use as a conscious act becomes difficult to sustain” and “HCI
must take into account the truly human element, conceptualizing ‘users’ as
embodied individuals who have desires and concerns and who function
within a social, economic, and political ecology” (p. 66).
References
Association for Computing Machinery Special Interest Group on Computer-Human
Interaction Curriculum Development Group (1996). Curricula for Human-Computer
Interaction. Retrieved May 10, 2011, from old.sigchi.org/cdg.
Adams, C. (2009a). Ergonomics. About.com. Retrieved November 11, 2010, from ergonomics.
about.com/od/glossary/g/defergonomics.htm.
Adams, C. (2009b). Human factors. About.com. Retrieved November 11, 2010, from
ergonomics.about.com/od/glossary/g/defhumanfactors.htm.
Andriessen, J. H. E. (1996). The why, how and what to evaluate of interaction technology: A
review and proposed integration. In P. J. Thomas (Ed.), CSCW requirements and evalua-
tion (pp. 107–124). London: Springer Verlag.
Evaluation of Information Systems 141
Bennett, J. L. (1972). The user interface in interactive systems. Annual Review of Information
Science and Technology, 7, 159–196.
Bennett, J. L. (1979). The commercial impact of usability in interactive systems. In B. Shackel
(Ed.), Man-computer communication, infotech state-of-the-art (Vol. 2, pp. 1–17).
Maidenhead, UK: Infotech International.
Blandford, A., Keith, S., Connell, I., & Edwards, H. (2004, June). Analytical usability evaluation
for digital libraries: A case study. Proceedings of the ACM/IEEE Joint Conference on Digital
Libraries, Tucson, AZ. DOI:10.1109/JCDL.2004.1336093.
Campbell, N. (2001). Usability assessment of library-related web sites: Methods and case stud-
ies. Chicago: Library Information Technology Association, American Library Association.
Card, S. K., Moran, T. P., & Newell, A. (1983). The psychology of human-computer interaction.
Hillsdale, NJ: Erlbaum.
Carey, J. M. (Ed.). (1997). Human factors in information systems: Relationship between user
interface design and human performance. Norwood, NJ: Intellect Books.
Connell, I., Green, T., & Blandford, A. (2003). Ontological sketch models: Highlighting user-
system misfits. In E. O’Neill, P. Palanque, & P. Johnson (Eds.), People and computers XVII:
Designing for society: Proceedings of HCI 2003 (pp. 163–178). London: Springer.
Eason, K. D. (1981). A task-tool analysis of manager-computer interaction. In B. Shackel
(Ed.), Man-computer interaction: Human factors aspects of computers and people (pp.
289–307). Rockville, MD: Sijthoff and Noordhoff.
Hom, J. (2000). Usability methods toolbox: Heuristic evaluation. Retrieved November 11,
2010, from usability.jameshom.com/heuristic.htm.
International Ergonomics Association. (2009). What is ergonomics? Retrieved July 31, 2009,
from www.iea.cc/01_what/What%20is%20Ergonomics.html.
International Organization for Standardization. (1994). Ergonomic requirements for office
work with visual display terminals. Part 11: Guidance on usability (ISO DIS 9241-11).
London: International Organization for Standardization.
Jeng, J. (2006). Usability of the digital library: An evaluation model. Unpublished doctoral
dissertation, Rutgers University, New Brunswick, NJ.
Ju, B. (2007). Does domain knowledge matter: Mapping users’ expertise to their information
interactions. Journal of the American Society for Information Science and Technology,
58(13), 2007–2020.
Lewis, C., & Wharton, C. (1997). Cognitive walkthroughs. In M. G. Helander, T. K. Landauer,
& P. V. Prabhu (Eds.), Handbook of human-computer interaction (pp. 717–732). New York:
Elsevier Science.
Nielsen, J. (1993). Usability engineering. Cambridge, MA: Academic Press.
Peters, C. (2002). Evaluating cross-language systems the CLEF way. Cultivate Interactive, 6.
Retrieved November 10, 2010, from www.cultivate-int.org/issue6/clef.
Pinelle, D., & Gutwin, C. (2000). A review of groupware evaluations. Proceedings of the IEEE
International Workshops on Enabling Technologies: Infrastructure for Collaborative
Enterprises 2000, 86–91. DOI: 10.1109/ENABL.2000.883709.
Sellen, A., Rogers, Y., Harper, R., & Rodden, T. (2009). Reflecting human values in the digital
age. Communications of the ACM, 52(3), 58–66.
142 Introduction to Information Science and Technology
Information Management
10.1. Introduction
In the 1970s, the information science community explicitly enlarged its
scope to consider not only information storage and retrieval but also the
developments in information creation, management, and policy that were
evident, especially with the development of more capable technologies.
Information came to be seen as a resource—one with both costs and benefits
.
for an organization. Macevičiute and Wilson (2002) have described four con-
sequences:
When the journal Information Storage & Retrieval changed its title to
Information Processing & Management, its editor noted, “the information
needs of research, management, and policy-making emerge as critical
requirements, and effective access to information from many disciplines
and from many parts of the world becomes imperative. Thus we must view
information processing and management as an integral part of overall pub-
lic policy-making, linked to social and economic affairs as well as to science
and technology” (Fry, 1975, p. i). Library and information science educators
also broadened their focus to prepare graduates for information manage-
ment positions in the private sector (Wilson, 1989); some schools changed
their names to include information management as well.
The term information management is ambiguous, but in information sci-
ence settings, it often connotes an explicit management (often business) per-
.
spective. In 2000 Macevičiute and Wilson (2002) reviewed the content of six
143
144 Introduction to Information Science and Technology
• Artificial intelligence
• Economics of information
• Information professionals
• Information systems
• Knowledge management
• Telecommunication industry
History of SI
Early, engineering-based approaches to understanding the interactions
between humans and computers investigated new technologies, but a few
researchers looked at social impacts, for example, privacy (Westin & Baker,
1972). Bell (1973) took a broad, informatics perspective on society and the
impact of computers. At the University of California–Irvine, the URBIS
Group’s studies of how computers affected local government in the 1980s
helped invent research that would look beyond the engineering perspective
to analyze qualitative data about social background and behavior; Dutton
(2005) claimed that Rob Kling coined the term social informatics while he
worked with the URBIS Group.
Researchers and developers in Scandinavian countries, Great Britain, and
northern Europe also wanted more than simplistic predictions or models of
the likely social impacts of information tools. They sought to go beyond
socially or technologically deterministic theories and look equally at social
146 Introduction to Information Science and Technology
issues and technology. Their research began to focus on both the surround-
ing social context and the mechanical properties of information systems.
By the 1990s, researchers in various fields began to recognize the inaccu-
racy of many predictions about the social effects of specific information and
communication technologies (ICTs); careful study revealed that the prognos-
ticators often used oversimplified conceptual models of specific kinds of ICTs
or of the nature of the relationship between technology and social change.
For example, Suchman (1996) studied a plan by a group of attorneys to
develop an expert system that would code documents in preparing civil liti-
gation; she found that the human coders’ work required more complex judg-
ments than an expert system could handle and recommended that the
coders be supported, rather than replaced, by the new system.
In 1996, researchers interested in “the interdisciplinary study of the
design, uses, and consequences of ICTs that takes into account their interac-
tion with institutional and cultural contexts” (Kling et al., 2005, p. 6) selected
SI as the name for their field. It represented international studies with vari-
ous information technology names, or simply informatics in Europe. Some of
the terms and phrases replaced by SI are new media, compunications, télé-
matique (French), informatique (French), social impacts (or analysis) of com-
puting, and computer-mediated communication studies. Centers for SI have
been established at Indiana University at Bloomington and at Napier
University in Edinburgh, Scotland.
Knowing how to build information systems and engineer communica-
tions without understanding the work practices and social context of users
can lead to waste and problems. Empirical evidence from more than 30
years of research supports this conclusion, but because these studies appear
in such a variety of sources, many systems developers do not know they exist.
SI researchers study technology in the context of human organizations
and institutions. Instead of asking deterministic questions such as how new
technologies—such as wireless handsets and smartphones—will change
people, SI asks about the impacts of individuals’ use of technology on the
social groups in which they participate. And, in the other direction, how do
groups influence technological developments? Applying SI to business appli-
cations is challenged by a world full of academic silos and technology-driven
markets.
Table 10.1 (adapted from Kling et al., 2005, p. 42) contrasts traditional,
engineer-based perspectives with social design views of research and
applications.
Note that explicit knowledge involves objective, technical, and reasoned
knowledge, such as policies, procedures, data, and documents; tacit knowl-
edge, however, is subjective and based on experience and personal cognition.
Researchers have used the SI perspective on system design and evaluation
in a wide range of organizations and institutions. The following examples
show how SI contributes to understanding of information dissemination in
organizations, product development, and strategic and business intelligence
(information about other organizations, often competitors).
Information Dissemination
Traditional approaches to track and illustrate the spread of information in an
organization use data flow diagram techniques. Data flow diagrams provide
models of business processes and the flow of data, such as the origins and
sharing of a purchase order. By illustrating where information is sent and
received, data flowcharts reveal the structure of an organization. Institutional
policies and business processes are also reflected in diagrams or models of
where data is stored and when it flows. Figure 10.1 shows the basic data flow
diagram for someone who loans videos to friends (and keeps very complete
records of the transactions).
In comparison, SI research has led to new views of information dissemi-
nation and social connections. Social network analysis has been one of the
most influential research perspectives; it represents the relationships among
social entities so that the patterns and their implications can be studied.
Relationships can be built on economic, political, personally interactive, and
affective connections. Figure 10.2 illustrates the social network and sharing
of information—the interactions—among seven student nurses blogging
about work and healthcare issues (Swain, 2006). Thicker lines indicate more
sharing of information.
148 Introduction to Information Science and Technology
Table 10.1 Engineer and social design views of research and applications
Engineering or Designer Approach Social Design View
Explicit views of work Tacit views of work
Work can be documented, made visible. Aspects of work are silent and shared.
Tasks are easy to articulate and transfer. Work is understood without articulation.
Training makes work possible. Learning makes work possible.
Tasks are at the core of work. Knowledge is at the core of work.
Position is clear in a hierarchy. Position is defined by informal political
networks and contacts.
Procedures and techniques are the basis of Conceptual understanding is the basis of action
action or doing work. or doing work.
Methods and procedures are the guides to Rules-of-thumb and judgment are the guides to
work. work.
Intended goals
Improve work efficiency. Improve work practices.
Reduce human error. Discover and solve problems.
Design assumptions
User needs are identified by what is visible and User needs emerge from observing everyday
documented; they can be rationalized into one work practices, which may conflict, and thus
set of needs. there are often real differences in needs.
Design is linear and can be documented at the System design is iterative and requires
end of system development. prototyping.
Individual work is to be supported through Collaboration and collaborative learning take
process clarity. place in a social context.
Efficiency is a desired outcome. Skill development is a desired outcome.
Technological choices
People can adapt to technologies chosen to Configurations matter and interact with human
support organizational values. work activities.
Convenience is provided by technology. Flexibility requires social choices.
Figure 10.1 Data flow diagram comprised of three data stores (open rectangles
Ex1, Ex2, and Ex3)—Video Collection, Friend Listing, and Request
List—and two processes (rectangles 1 and 2)—Inquiry and
Requests. When a Friend (an external agent, indicated by the oval
bubble) inquires about a video, data flow from the Video Collection
(Ex1) file to the Inquiry Process. When a Friend asks to borrow a
video, data flow from Ex1, Ex2, and Ex3 to the Requests process,
which provides the video and a return agreement to the Friend.
gradually moved to a revised version with less clutter and continued features
that allow a constantly updated analysis of friends’ changes.
Product Development
SI has adopted techniques from anthropology and sociology. Careful obser-
vation of behavior helps SI researchers recognize nuances that other engi-
neers and designers might miss.
For example, a 1990s study of the work practices in a telecommunications
company provided a social perspective of information systems (Sachs, 1995).
The researcher had been trained as an anthropologist; she looked at the
social choices involved in the installation of telephone lines. The ICTs used
by the technicians had been designed with assumptions about the number of
employees required to respond to a service request. When efficiency consul-
tants observed technicians talking among themselves and forming teams to
go on calls, the behavior was seen as socializing and nonproductive. The con-
sultants recommended reducing conversations and sending out the first
available technician. The consultants built a “trouble ticket” system to sched-
ule and track calls and responses. However, the expected increase in produc-
tivity and efficiency did not occur. The SI researcher noted that with the
eliminated conversations, the technicians “compared notes … they figured
out what [a problem] was and worked on it together” (Sachs, 1995, p. 39). In
the absence of an SI perspective, valuable troubleshooting behavior had
been overlooked. In addition, understanding the social context revealed that
the technicians developed specializations; some were more efficient at
responding to certain problems than were others. The trouble ticket strategy
had assumed that standardized training made all technicians equal and that
any one of them could respond to a service request.
Digitization of records in healthcare requires sophisticated data process-
ing and management. Patterson, Cook, and Render (2002) report on a
Veterans Administration hospital’s efforts to digitize patient records. The
project took a simple, direct view of the work done by nurses, and the records
were designed to reflect these nurses’ work habits. Because the system
designers did not collect or take account of the collaboration that goes into
creating patient records, the nurses had to spend more time, not less, in
working around the new system that was intended to reduce workloads and
improve efficiency.
Dutton (2005) pointed out that “the intellectual craftwork that underpins
its multidisciplinary research is critical to the success of Social Informatics”
(p. xiii); conducting SI research is not a cookbook operation. It requires schol-
ars who have patience, a willingness to collaborate, the ability to observe
people in social settings, the agility in analysis to see patterns and trends, and
a willingness to transfer intellectual capital to others. Nevertheless, the need
to combine technological design with qualitative research about users will
grow as computers have impacts on users in changing social contexts and as
users influence communications.
References
Bell, D. (1973). The coming of post-industrial society: A venture in social forecasting. New York:
Basic Books.
Information Management 155
Blair, D. C. (2002). Knowledge management: Hype, hope, or help? Journal of the American
Society for Information Science and Technology, 53(12), 1019–1028.
Brown, J. S., & Duguid, P. (2000). The social life of information. Cambridge, MA: Harvard
Business School Press.
Davenport, E., & Hall, H. (2002). Organizational knowledge and communities of practice.
Annual Review of Information Science and Technology, 36(1), 171–227.
Davenport, T., & Prusak, L. (1998). Working knowledge. Boston: Harvard Business School
Press.
Dutton, W. H. (2005). Foreword. In R. Kling, H. Rosenbaum, & S. Sawyer (Eds.),
Understanding and communicating social informatics: A framework for studying and
teaching the human contexts of information and communication technologies (pp. xi–xv).
Medford, NJ: Information Today.
Fry, B. M. (1975). A change in title and scope to meet changing needs. Information Processing
& Management, 11(1), i.
Jin, T., Bouthillier, F., Bowen, B., Boettger, S., Montgomery, M., Pillania, R., et al. (2007).
Traditional and non-traditional knowledge management in research and practice.
Proceedings of the American Society for Information Science and Technology, 44, 1–5. DOI:
10.1002/meet.1450440114.
Kling, R. (2003). Social informatics. Encyclopedia of library and information science (pp.
2656–2661). New York: Dekker.
Kling, R., Rosenbaum, H., & Sawyer, S. (2005). Understanding and communicating social
informatics: A framework for studying and teaching the human contexts of information
and communication technologies. Medford, NJ: Information Today.
.
Macevičiute, E., & Wilson, T. D. (2002). The development of the information management
research area. Information Research, 7(3). Retrieved November 11, 2010, from informationr.
net/ir/7-3/paper133.html.
Patterson, E. S., Cook, R. I., & Render, M. L. (2002). Improving patient safety by identifying
side effects from introducing bar coding in medication administration. Journal of the
American Medical Informatics Association, 9(5), 540-553. DOI:10.1197/jamia.M1061.
Sachs, P. (1995, September). Transforming work: Collaboration, learning, and design.
Communications of the ACM, 38(9), 36–44. DOI:10.1145/223248.223258.
Suchman, L. (1996). Supporting articulation work. In R. Kling (Ed.), Computerization and
controversy: Value conflicts and social choices (2nd ed., pp. 407–413). San Diego, CA:
Academic Press.
Swain, D. (2006). Can blogging be used to improve medication error collection as part of
health informatics knowledge management? In S. Hawamdeh (Ed.), Creating collabora-
tive advantage through knowledge and innovation (pp. 301–314). Hackensack, NJ: World
Scientific.
Swartz, J. (2008, September 22). Some Facebook users aren’t fond of website’s new face. USA
Today. Retrieved November 11, 2010, from www.usatoday.com/tech/products/2008-09-
21-facebook_N.htm.
Thompson, C. (2006, December 3). Open-source spying. New York Times. Retrieved
November 11, 2010, from www.nytimes.com/2006/12/03/magazine/03intelligence.html.
156 Introduction to Information Science and Technology
Westin, A., & Baker, M. (1972). Databanks in a free society: Computers, record-keeping, and
privacy. New York: Quadrangle Books.
Wilson T. D. (1989). Towards an information management curriculum. Journal of
Information Science, 15(4–5), 203–209.
Wilson, T. D. (2003). Information management. In I. J. Feather & P. Sturges (Eds.),
International Encyclopedia of Information and Library Science (2nd. ed., pp. 263–278).
London: Routledge.
CHAPTER 11
Publication and
Information Technologies
Information Explosion
For decades researchers have noted the increasing rate of publication. Rider
(1944) suggested that libraries would be unable to cope with the geometrical
increase in the number of books published. Price (1975) demonstrated the
geometric growth of scientific publications; both Rider’s and Price’s books are
classics in the literature about the information explosion. Lyman and Varian
(2003) reported that in 2002 the world produced about 5 exabytes of new
information, stored on print, film, magnetic, and optical storage media (of
which 92 percent was stored on magnetic media, mostly hard disks). This is
almost 800 megabytes of recorded information per person (the equivalent of
about 30 feet [9.1 meters] of books) in 1 year.
Although there is a dramatic increase in the number of publications, this
alone is not evidence that the amount of information has increased; the rapid
obsolescence of many recent publications and the exceptional staying power
of works from decades or centuries ago demonstrate that merely publishing
information does not guarantee its lasting utility (Spang-Hanssen, 2001).
157
158 Introduction to Information Science and Technology
Collaboratories
Collaboration among scientists has increased dramatically in the past cen-
tury; the equipment and support needed for research in areas ranging from
astronomy to genetics has hastened the development of large-scale science
(Weinberg, 1961; what Price, 1963, termed big science). The number of jour-
nal articles with more than one author continues to increase—authors some-
times number in the hundreds, and today authors are much more likely to
acknowledge support from technical and support staff as well as funding
agencies (Sonnenwald, 2007). Expectations of the scholarly community
(“publish or perish”) may encourage the elevation of what would have been
an acknowledgment to a co-authorship status and the growth of “hyper-
authorship” (Cronin, 2001, p. 558). Scientific collaborations extend beyond
local institutions, with many having multidisciplinary perspectives and
global scope. In addition to their scientific and technical skills, researchers
need administrative abilities to coordinate their groups. They rely on tech-
nology to support the communication, provide access to scientific instru-
ments, and record the information that is fundamental to modern science.
In 1989 William Wulf, at the National Science Foundation, coined the term
collaboratory by combining the words collaboration and laboratory. He
defined it as “a center without walls, in which users can perform their research
without regard to geographical location—interacting with colleagues,
160 Introduction to Information Science and Technology
Peer Review
Publishers function as filters, deciding what information will be dissemi-
nated—they are a primary defense against being overwhelmed by the infor-
mation explosion. In scholarly publishing, this filtering is often done by means
of peer review: Researchers knowledgeable in the field read, comment on, and
make recommendations regarding the acceptance of work submitted for pub-
lication. Peer review is used extensively in the sciences and social sciences.
Publication and Information Technologies 161
The peer review process is usually “blind,” in that the author does not
know who the reviewers are. It may also be “double blind,” so that the
reviewer does not know the identity of the author(s) he or she is reviewing;
however, experienced reviewers and authors may be able to guess the iden-
tity, even if they are not told. In some cases the author may suggest reviewers
to the editor. Critics of peer review have raised concerns about its psycholog-
ical limitations (for example, reviewers who move from criticism to abuse of
the work at hand), conflicts of interest (for example, reviewers trying to sup-
press papers critical of their own research or of work in which they have a
financial interest), and ethical problems (such as the reviewer suppressing or
even stealing the author’s ideas). The internet can support more transparent
interactions that provide alternatives to the traditional procedures for peer
review. The science journal Nature (2006) hosted a debate that featured vari-
ous perspectives on the problems and possible new approaches.
Open Access
Open access (OA) is free online access, for any user, to the full text of scien-
tific and scholarly material: free availability and unrestricted use. Suber
(2007) describes OA as the “unrestricted reading, downloading, copying,
sharing, storing, printing, searching, linking, and crawling of the full text of
the work” (The legal basis of OA is the consent of the copyright holder …).
OA material is usually copyrighted so that the author can maintain the
integrity of the work rather than limit its use. Many authors use a Creative
Commons license (creativecommons.org/about/licenses), which allows the
author to set the level of restrictions on use of the work. Levels range from
attribution, where users may distribute, display, and perform the copy-
righted work and derivative works based upon it, but only if they give proper
credit to the author, to attribution-noncommercial-no derivatives, which
allows copying and sharing of the work as long as the license holder receives
credit and the work is not changed or used commercially.
OA thus challenges the established economic models of publishing, in
which the publisher takes the risk, foots the bill, and reaps any economic
reward. OA is not without production costs; however, because so much of the
creation, preparation, and distribution are done online, these can be lower
than the costs of print publication. Two models of OA have evolved for jour-
nal articles: gold and green.
An OA journal (the gold route) follows the usual practices for submission
and peer review. Journal articles are published online and are freely avail-
able. Costs may be paid by some combination of the authors, an author’s
employer or funding agency, subsidies from universities or professional
organizations, institutional subscriptions, and advertising. The Public
Library of Science (www.plos.org/index.php) is a nonprofit organization of
162 Introduction to Information Science and Technology
many reasons, not all of which indicate subject relatedness; for example, cer-
emonial citations might mention eminent people or colleagues whose work
is only tangentially related to the paper (Cole & Cole, 1973). Various schemes
have been suggested to clarify citer motivations, but Martyn’s (1964) criti-
cisms of bibliographic coupling also apply to co-citation analysis.
Regardless of the objections, bibliographic coupling and co-citation
analysis have been adopted and used extensively. Visualization of research
domains through bibliometric mapping (using the techniques of biblio-
graphic coupling and co-citation analysis) has become one of the major spe-
cialties in bibliometrics.
11.4. Webometrics
Webometrics in information science is currently dominated by World Wide
Web link analysis and strongly influenced by citation analysis, being typically
applied to scientific data. This section discusses the use of link count metrics
in the broad context of informetrics, assessing methodologies and their
potential for general social science research. The closely related area of web
citation analysis is also reviewed, as are search engine evaluation and metric-
based research into blogs and social network sites.
In the early years of the web, several information scientists recognized the
structural similarity between hyperlinks and citations, noticing that both are
inter-document connections and pointers (Larson, 1996; Rodríguez i Gairín,
1997; Rousseau, 1997). This observation underpinned the creation of a new
field—webometrics (Almind & Ingwersen, 1997)—defined as the application
of quantitative techniques to the web, using methods drawn from informet-
rics (Björneborn & Ingwersen, 2004).
The power of the web could first be easily tapped for link analysis when
commercial search engines released interfaces allowing link searches
(Ingwersen, 1998; Rodríguez i Gairín, 1997). For example, in 1997 it became
possible with AltaVista to submit extremely powerful queries, such as for the
number of webpages in the world that linked to Swedish pages. This meant
that with a few hours’ work submitting search engine queries, the “impact” of
sets of websites could be compared (assuming that links, similar to citations,
measure the impact of published information). At the time, most citation
analysis was conducted with the use of the citation databases produced by
the Institute for Scientific Information, and the searcher or the searcher’s
institution paid for access. With link analysis, the web “database” is freely
available, allowing access to a wider set of potential researchers. With the use
of commercial search engines, the impact of many entities was compared,
including journals, countries, universities or departments within a country,
166 Introduction to Information Science and Technology
and library websites (An & Qiu, 2004; Harter & Ford, 2000; Ingwersen, 1998;
Smith, 1999; Tang & Thelwall, 2008; Thomas & Willet, 2000). The early studies
showed that care was needed to conduct appropriate link analyses because
of many complicating factors such as duplicate webpages and sites, errors in
search engine reporting, incomplete search engine coverage of the web, link
replication within a site, and spurious or trivial reasons for link creation (Bar-
Ilan, 2001; Björneborn & Ingwersen, 2001; Egghe, 2000; Harter & Ford, 2000;
Smith, 1999; Snyder & Rosenbaum, 1999; van Raan, 2001). Nevertheless, link
analysis has produced interesting and useful results and has been adopted by
several non-information science fields, as shown below.
This review of webometrics focuses on recent results in the most devel-
oped area, link analysis, and covers web-based citation analysis more briefly.
The main review is preceded by a brief methodological discussion and spec-
ulation about the range of types of information that this new informetric
technique may be employed to help measure.
Link Analysis
Link Creation Motivations
A few studies have investigated why links are created. These studies mainly
operate on a small scale and use either an information science–style classifica-
tion approach or a more sociological, ethnographic method. Link creators have
a variety of motivations (Bar-Ilan, 2004b, 2004c; Harries, Wilkinson, Price,
Fairclough, & Thelwall, 2004; Wilkinson, Harries, Thelwall, & Price, 2003). Link
patterns vary according to the level at which they are aggregated, with geo-
graphic and cognitive connections dominating at different subnational,
Publication and Information Technologies 167
determine how technologies are adopted and adapted. Perhaps more sur-
prising are Li’s findings of international differences within the same field; for
instance, biology links in Australia were significantly less international (60
percent) than those of the U.K. (74 percent) and Canada (80 percent). From
a functionalist perspective, and given the international nature of science,
broad similarities in web use might be expected. Nevertheless, the differ-
ences support organizational sociologies of science that emphasize the
importance of multiple social factors in the practice of science (Fuchs, 1992).
From a practical, informetric perspective, the lesson is that link counts are
perhaps most valuable for identifying unexpected differences and, because
link pages can be traced, identifying their cause. Thus, link analysis seems a
natural partner to sociologies of science, as is citation analysis.
A good example is the use of link analysis, in conjunction with other
sources of information about connections among researchers (such as
European collaborative project membership), to investigate patterns of col-
laboration in Europe for a specific field (Heimeriks, Hörlesberger, & Van den
Besselaar, 2003). This example demonstrates the value of link analysis as part
of a multiple-method approach in scientometrics.
This framework echoes many of the points made previously, but notice
that there is a preliminary stage with a pilot study. This is important because
the variety of uses made of the web (Burnett & Marshall, 2002) means that
our intuitions about how web links ought to be used in a particular context
can be wrong. The pilot study allows a research problem that will not yield an
informative link analysis to be aborted before too much effort has been
invested. Perhaps the most important message of the framework, however, is
the centrality of link-type classification studies. If we have no idea why links
are created, then we can make only the most abstract inferences from link
counts.
• Lists of the most popular blogs (by analogy with most often-cited
authors)
Beyond deciding what is possible for future blog link analyses, it is impor-
tant to discuss what is likely to be useful and how link analysis can best be
exploited. Clearly there is no pressing need for evaluational blog link analy-
sis comparable to the need to use citations to evaluate scientists’ productiv-
ity: it might be useful but will not significantly help to direct government
research funding. The key findings will likely be most useful for social science
research by providing information about the phenomenon of blogging and
by providing data about the spread of individual topics (e.g., presidential
debate topics could be of interest to political scientists) or, more generally, for
the analysis of information diffusion in blogspace by finding spreading pat-
terns that are common across many topics (Gill, 2004; Gruhl, Guha, Liben-
Nowell, & Tomkins, 2004). Topic-centered blog link analysis will probably
need to employ some kind of text analysis to identify topic-relevant blogs or
blog posts and will probably need to be semi-automated with software to
gather and filter blog data.
citations being found in total. Some additional patterns were identified, such
as weaker correlations for non-U.K. and non-U.S. journals. Other studies
have also compared web and traditional citations in more restricted settings,
such as the CiteSeer computer science web digital library (Goodrum,
McCain, Lawrence, & Giles, 2001), also finding significant correlations with
WoS data. A significant difference in the case of CiteSeer was that computer
science conference papers attracted significantly more web-based citations
than journal articles did, in comparison with their WoS citations. It remains
an open question whether the differences identified represent shortcomings
in WoS data or web data or both. Nevertheless, it may be that the main appli-
cation of web-based citation analysis is in providing a second data source
with which to compare the WoS citation index. This is an important role,
given the use of WoS data in important decisions such as promotion and
funding.
Social Networks
Another new webometrics direction is the analysis of the contents of social
network sites. This combination of webometrics and data mining involves
extracting specific data from webpages, such as the number of days since the
last login of each member. Figure 11.2 shows these data for MySpace for a
random sample of members. Member IDs are given in ascending order; this
graph shows that many members (the top line in the graph, about a third of
all members) give up within a week of joining but that many others (the bot-
tom line of the graph, also about a third) continue to use the site at least
weekly (Thelwall, 2008b).
Publication and Information Technologies 175
Figure 11.1 A blog trend graph of the cartoons debate: Volume of blog postings
related to the Danish cartoons debate (Thelwall, 2007)
Figure 11.2 Last logon dates plotted against user IDs for random MySpace
members (reproduced with permission from Thelwall, 2008b)
176 Introduction to Information Science and Technology
References
Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3),
211–230.
Almind, T. C., & Ingwersen, P. (1997). Informetric analyses on the World Wide Web:
Methodological approaches to “webometrics.” Journal of Documentation, 53(4),
404–426.
Publication and Information Technologies 177
An, L., & Qiu, J. P. (2004). Research on the relationships between Chinese journal impact fac-
tors and external web link counts and web impact factors. Journal of Academic
Librarianship, 30(3), 199–204.
Association of College and Research Libraries. (2009). Integrating scholarly communication
into your library. Chicago, IL: American Library Association. Retrieved November 11,
2010, from www.acrl.ala.org/scholcomm/node/21.
Bar-Ilan, J. (1999). Search engine results over time: A case study on search engine stability.
Retrieved November 11, 2010, from www.cindoc.csic.es/cybermetrics/articles/v2i1p1.
html.
Bar-Ilan, J. (2001). Data collection methods on the web for informetric purposes: A review
and analysis. Scientometrics, 50(1), 7–32.
Bar-Ilan, J. (2004a). Blogarians: A new breed of librarians. Proceedings of the Annual Meeting
of the American Society for Information Science and Technology, 41, 119–128.
Bar-Ilan, J. (2004b). A microscopic link analysis of academic institutions within a country:
The case of Israel. Scientometrics, 59(3), 391–403.
Bar-Ilan, J. (2004c). Self-linking and self-linked rates of academic institutions on the web.
Scientometrics, 59(1), 29–41.
Bar-Ilan, J., & Peritz, B. C. (2004). Evolution, continuity, and disappearance of documents on
a specific topic on the web: A longitudinal study of “informetrics.” Journal of the
American Society for Information Science and Technology, 55(11), 980–990.
Baym, N. (2008). Tunes that bind? Predicting friendship strength in a music-based social net-
work. Retrieved October 21, 2008, from www.onlinefandom.com/wp-content/uploads/
2008/10/tunesthatbind.pdf.
BBC. (2005). Blog reading explodes in America. BBC News. Retrieved November 11, 2010,
from news.bbc.co.uk/1/hi/technology/4145191.stm.
Björneborn, L. (2004). Small-world link structures across an academic web space: A library
and information science approach. Copenhagen, Denmark: Royal School of Library and
Information Science.
Björneborn, L., & Ingwersen, P. (2001). Perspectives of webometrics. Scientometrics, 50(1),
65–82.
Björneborn, L., & Ingwersen, P. (2004). Towards a basic framework for webometrics. Journal
of the American Society for Information Science and Technology, 55(14), 1216–1227.
Blair, D. C. (1990). Language and representation in information retrieval. Amsterdam:
Elsevier.
Borko, H. (1968). Information science: What is it? American Documentation, 19(1), 3–5.
Bradford, S. C. (1948). Documentation. London: Crosby Lockwood.
Bradford, S. C. (1953). Documentation (2nd ed.). London: Crosby Lockwood.
Brookes, B. C. (1969). Bradford’s law and the bibliography of science. Nature, 224(5223),
953–956.
Burnett, R., & Marshall, P. (2002). Web theory: An introduction. London: Routledge.
Caldas, A. (2003). Are newsgroups extending “invisible colleges” into the digital infrastruc-
ture of science? Economics of Innovation and New Technology, 12(1), 43–60.
178 Introduction to Information Science and Technology
Capocci, A., Servedio, V. D. P., Colaiori, F., Buriol, L. S., Donato, D., Leonardi, S., et al. (2006).
Preferential attachment in the growth of social networks: The case of Wikipedia. Retrieved
November 11, 2010, from arxiv.org/abs/physics/0602026.
Cole, J. R., & Cole, S. (1973). Social stratification in science. Chicago: University of Chicago
Press.
Committee Toward a National Collaboratory: Establishing the User-Developer Partnership,
National Research Council. (1993). National collaboratories: Applying information tech-
nology for scientific research. Washington, DC: National Academies Press. Retrieved
November 11, 2010, from www.nap.edu/catalog.php?record_id=2109.
Crane, D. (1972). Invisible colleges: Diffusion of knowledge in scientific communities. Chicago:
University of Chicago Press.
Cronin, B. (2001). Hyperauthorship: A postmodern perversion or evidence of a structural
shift in scholarly communication practices? Journal of the American Society for
Information Science and Technology, 52(7), 558–569.
Diodato, V. (1994). Dictionary of bibliometrics. New York: Haworth Press.
Egghe, L. (1991). The exact place of Zipf’s and Pareto’s law amongst the classical information
laws. Scientometrics, 20(1), 93–106.
Egghe, L. (2000). New informetric aspects of the internet: Some reflections, many problems.
Journal of Information Science, 26(5), 329–335.
Egghe, L., & Rousseau, R. (1990). Introduction to informetrics: Quantitative methods in
library, documentation and information science. Amsterdam: Elsevier.
Fedorowicz, J. (1982). The theoretical foundation of Zipf’s law and its application to the bib-
liographic database environment. Journal of the American Society for Information
Science, 33(5), 285–293.
Finholt, T. (2002). Collaboratories. Annual Review of Information Science and Technology, 36,
73–107.
Fry, J. (2006). Studying the scholarly web: How disciplinary culture shapes online represen-
tations. Cybermetrics, 10(1). Retrieved June 1, 2009, from www.cindoc.csic.es/cybermetrics/
articles/v10i1p12.html.
Fuchs, S. (1992). The professional quest for truth: A social theory of science and knowledge.
Albany: State University of New York Press.
Garrido, M., & Halavais, A. (2003). Mapping networks of support for the Zapatista movement:
Applying social network analysis to study contemporary social movements. In M.
McCaughey & M. Ayers (Eds.), Cyberactivism: Online activism in theory and practice (pp.
165–184). London: Routledge.
Gill, K. E. (2004). How can we measure the influence of the blogosphere? Paper presented at
the WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and
Dynamics. Retrieved May 11, 2011, from citeseerx.ist.psu.edu/viewdoc/download?doi=
10.1.1.124.2509+rep=rep1+type=pdf.
Goodrum, A. A., McCain, K. W., Lawrence, S., & Giles, C. L. (2001). Scholarly publishing in the
internet age: A citation analysis of computer science literature. Information Processing &
Management, 37(5), 661–676.
Publication and Information Technologies 179
Granovetter, M. (1973). The strength of weak ties. American Journal of Sociology, 78,
1360–1380.
Gruhl, D., Guha, R., Liben-Nowell, D., & Tomkins, A. (2004). Information diffusion through
blogspace. Paper presented at the WWW2004, New York. DOI: 10.1145/988672.988739.
Hajjem, C., Harnad, S., & Gingras, Y. (2005) Ten-year cross-disciplinary comparison of the
growth of open access and how it increases research citation impact. IEEE Data
Engineering Bulletin, 28(4), 39–47. Retrieved July 1, 2009, from eprints.ecs.soton.ac.uk/
12906.
Harries, G., Wilkinson, D., Price, E., Fairclough, R., & Thelwall, M. (2004). Hyperlinks as a data
source for science mapping. Journal of Information Science, 30(5), 436–447.
Harter, S., & Ford, C. (2000). Web-based analysis of e-journal impact: Approaches, problems,
and issues. Journal of the American Society for Information Science, 51(13), 1159–1176.
Heimeriks, G., Hörlesberger, M., & Van den Besselaar, P. (2003). Mapping communication
and collaboration in heterogeneous research networks. Scientometrics, 58(2), 391–413.
Heimeriks, G., & Van den Besselaar, P. (2006). Analyzing hyperlink networks: The meaning of
hyperlink-based indicators of knowledge. Cybermetrics, 10(1). Retrieved June 1, 2009,
from www.cindoc.csic.es/cybermetrics/articles/v10i1p1.html.
Hjørland, B., & Nicolaisen, J. (2005). Bradford’s law of scattering: Ambiguities in the concept
of “subject.” Proceedings of the 5th International Conference on Conceptions of Library
and Information Sciences, 96–106. Retrieved December 20, 2010, from vip.db.dk/jni/articles/
hjorland&nicolaisen(2005).pdf.
Hulme, E. W. (1923). Statistical bibliography in relation to the growth of modern civilization.
London: Grafton.
Ingwersen, P. (1998). The calculation of web impact factors. Journal of Documentation, 54(2),
236–243.
Kessler, M. M. (1963). Bibliographic coupling between scientific papers. American
Documentation, 14(1), 10–25.
Kling, R., & McKim, G. (2000). Not just a matter of time: Field differences and the shaping of
electronic media in supporting scientific communication. Journal of the American
Society for Information Science, 51(14), 1306–1320.
Kumar, R., Novak, J., Raghavan, P., & Tomkins, A. (2004). Structure and evolution of blog-
space. Communications of the ACM, 47(12), 35–39.
Larson, R. R. (1996). Bibliometrics of the World Wide Web: An exploratory analysis of the intel-
lectual structure of cyberspace. Paper presented at the ASIS 59th annual meeting,
Baltimore.
Li, X., Thelwall, M., Musgrove, P. B., & Wilkinson, D. (2003). The relationship between the
WIFs or Inlinks of computer science departments in UK and their RAE ratings or research
productivities in 2001. Scientometrics, 57(2), 239–255.
Li, X., Thelwall, M., Wilkinson, D., & Musgrove, P. B. (2005a). National and international uni-
versity departmental web site interlinking, part 1: Validation of departmental link analy-
sis. Scientometrics, 64(2), 151–185.
180 Introduction to Information Science and Technology
Li, X., Thelwall, M., Wilkinson, D., & Musgrove, P. B. (2005b). National and international uni-
versity departmental web site interlinking, part 2: Link patterns. Scientometrics, 64(2),
187–208.
Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of the
Washington Academy of Sciences, 16(12), 317–324.
Lyman, P., & Varian, H. R. (2003). How much information? 2003. Retrieved November 11,
2010, from www.sims.berkeley.edu/how-much-info-2003.
Marlow, C. (2004). Audience, structure and authority in the weblog community.
International Communication Association Conference. Retrieved November 11, 2010,
from alumni.media.mit.edu/~cameron/cv/pubs/04-01.pdf.
Marshakova, I. V. (1973). A system of document connection based on references. Scientific
and Technical Information Serial of VINITI, 6(2), 3–8.
Martyn, J. (1964). Bibliographic coupling. Journal of Documentation, 20(4), 236.
Mayr, P., & Tosques, F. (2005). Google web APIs: An instrument for webometric analyses?
Retrieved November 11, 2010, from www.ib.hu-berlin.de/%7Emayr/arbeiten/ISSI2005_
Mayr_Toques.pdf.
Mettrop, W., & Nieuwenhuysen, P. (2001). Internet search engines: Fluctuations in document
accessibility. Journal of Documentation, 57(5), 623–651.
Mosteller, F., & Wallace, D. L. (1964). Inference and disputed authorship: The Federalist.
Reading, MA: Addison-Wesley.
Nardi, B. A., Schiano, D. J., Gumbrecht, M., & Swartz, L. (2004). Why we blog.
Communications of the ACM, 47(12), 41–46.
Narin, F. (1976). Evaluative bibliometrics: The use of publication and citation analysis in the
evaluation of scientific activity. Cherry Hill, NJ: Computer Horizons.
Nature. (2006). Nature’s peer review debate. Retrieved July 1, 2009, from www.nature.com/
nature/peerreview/debate/index.html.
Nicolaisen, J. (2007) Citation analysis. Annual Review of Information Science and Technology,
41, 609–641.
Nisonger, T. E. (1998). Management of serials in libraries. Englewood, CO: Libraries
Unlimited.
Ohly, H. P. (1982). A procedure for comparing documentation language applications: The
transformed Zipf curve. International Classification, 9(3), 125–128.
Otte, E., & Rousseau, R. (2002). Social network analysis: A powerful strategy, also for the
information sciences. Journal of Information Science, 28(6), 441–453.
Park, H. W. (2003). Hyperlink network analysis: A new method for the study of social struc-
ture on the web. Connections, 25(1), 49–61.
Park, H. W., & Thelwall, M. (2003). Hyperlink analyses of the World Wide Web: A review.
Journal of Computer-Mediated Communication, 8(4). Retrieved November 11, 2010, from
jcmc.indiana.edu/vol8/issue4/park.html.
Potter, W. G. (1988). “Of making many books there is no end”: Bibliometrics and libraries.
Journal of Academic Librarianship, 14, 238a–238c.
Price, D. J. D. (1963). Little science, big science. New York: Columbia University Press.
Publication and Information Technologies 181
Thelwall, M. (2007). Blog searching: The first general-purpose source of retrospective public
opinion in the social sciences? Online Information Review, 31(3), 277–289.
Thelwall, M. (2008a). Extracting accurate and complete results from search engines: Case
study Windows Live. Journal of the American Society for Information Science and
Technology, 59(1), 38–50.
Thelwall, M. (2008b). Social networks, gender and friending: An analysis of MySpace mem-
ber profiles. Journal of the American Society for Information Science and Technology,
59(8), 1321–1330.
Thelwall, M. (2009). Homophily in MySpace. Journal of the American Society for Information
Science and Technology, 60(2), 219–231.
Thelwall, M., Wouters, P., & Fry, J. (2008). Information-centred research for large-scale analy-
sis of new information sources. Journal of the American Society for Information Science
and Technology, 59(9), 1523–1527.
Thomas, O., & Willet, P. (2000). Webometric analysis of departments of librarianship and
information science. Journal of Information Science, 26(6), 421–428.
UNISIST. (1971). Study report on the feasibility of a World Science Information System by the
United Nations Educational, Scientific and Cultural Organization and the International
Council of Scientific Unions. Paris: UNESCO. Retrieved November 11, 2010, from unesdoc.
unesco.org/images/0006/000648/064862eo.pdf.
van Raan, A. F. J. (2001). Bibliometrics and internet: Some observations and expectations.
Scientometrics, 50(1), 59–63.
Vasileiadou, E., & Van den Besselaar, P. (2006). Linking shallow, linking deep. How scientific
intermediaries use the web for their network of collaborators. Cybermetrics, 10(1).
Retrieved May 11, 2011, from dare.uva.nl/document/22526.
Vaughan, L., & Shaw, D. (2003). Bibliographic and web citations: What is the difference?
Journal of the American Society for Information Science and Technology, 54(14),
1313–1322.
Vaughan, L., & Shaw, D. (2005). Web citation data for impact assessment: A comparison of
four science disciplines. Journal of the American Society for Information Science and
Technology, 56(10), 1075–1087.
Voss, J. (2005). Measuring Wikipedia. Proceedings of the 10th International Conference of the
International Society for Scientometrics and Informetrics, 221–231.
Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications.
Cambridge, UK: Cambridge University Press.
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks. Nature,
393, 440–442.
Weinberg, A. (1961). Impact of large-scale science on the United States. Science, 134,
161–164.
White, H. D. (1981). “Bradfordizing” search output: How it would help online users. Online
Review, 5(1), 47–54.
Wilkinson, D., Harries, G., Thelwall, M., & Price, E. (2003). Motivations for academic web site
interlinking: Evidence for the web as a novel source of information on informal scholarly
communication. Journal of Information Science, 29(1), 49–56.
Publication and Information Technologies 183
Wyllys, R. E. (1981). Empirical and theoretical bases of Zipf’s law. Library Trends, 30(1), 53–64.
Zipf, G. K. (1935). The psycho-biology of language. New York: Houghton-Mifflin.
Zipf, G. K. (1949). Human behavior and the principle of least-effort. New York: Addison-
Wesley.
Zuccala, A. (2006). Author cocitation analysis is to intellectual structure as web colink analy-
sis is to …? Journal of the American Society for Information Science and Technology,
57(11), 1486–1501.
CHAPTER 12
Information Policy
1. As a resource
2. As a commodity
3. As a perception of pattern
4. As a societal force
185
186 Introduction to Information Science and Technology
as occupying a “slum dwelling in the town of economics” (p. 213). As the eco-
nomic and social significance of information has been better understood, its
perceived importance as a necessity for economic and political well-being has
been more thoroughly recognized, and information policy has been perceived
as necessary to the survival of the state (high policy).
Information has characteristics that make it fundamentally different from
other commodities. Most important, information is non-rivalrous: The inter-
nalization of information by one user does not impede the ability of another
to do the same. This property makes information unique among commodi-
ties; two people may use the same recipe to make bread, but the two people
cannot buy the same loaf of bread. However, like bread, information can be
sold; it has economic value. Unlike bread, information does not require a
fixed physical form, and it is not bound by physical location.
Many researchers have suggested that the internet suffers from a tragedy
of the commons market inefficiency. As Shapiro and Varian (1997) put it,
“The key aspect of information for the purposes of economic analysis is that
information is costly to produce, but very cheap to reproduce, especially in
digital form” (“Information as a public good”). As a consequence, digital
information becomes ubiquitous and thus almost free. Some people have
suggested that it is most efficient to make information freely available.
However, producing information is expensive, and these costs must be borne
somewhere in the information production chain. Shapiro and Varian suggest
that (government) information producers charge “at least incremental cost”
for the use of information. Failure to provide compensation to information
producers will reduce incentive for future innovation. Moreover, if the infor-
mation infrastructure is treated as a free, public resource, it may be devalued
and filled with the informational equivalent of rubbish and undesirable ele-
ments. The omnipresence of intrusive or disreputable advertising in some
free online services and applications (and its absence in subscription-based
services) provides a cautionary example.
Publishers’ roles in the dissemination of information are changing rapidly,
a situation sometimes perceived as challenging their continued existence.
Some commentators argue that these concerns are unjustified, that the dis-
tribution of IP changes often, each time presenting information property cre-
ators and stakeholders the opportunity to access new revenue streams. For
example, music publishers resisted the player piano and radio broadcast; and
the entertainment industry had concerns about home video technology (Sony
Corp. of America v. Universal City Studios, Inc., the “Betamax case”).
Technological restrictions on duplication and dissemination of information
are often suggested in such cases. To date, however, such technologies have
generally proven to be ineffective and perhaps even counterproductive.
Rightsholders have sought policies to enforce their rights, either in the courts
190 Introduction to Information Science and Technology
or through laws. In this way the government has been involved in information
policies related to the enforcement of property rights and rights of contract.
Given information’s entropic nature (in the sense of spreading and
becoming uniform) and ease of reproduction, some scholars (such as Lessig,
2002) have suggested that information is a public good. Public goods are typ-
ically defined as being non-rivalrous and difficult to restrict to specified
users. Advocates of digital rights management (DRM) and encryption tech-
nology maintain that information can be made exclusive, so that it would fail
the definitional test of being a public good. However, technological measures
to exclude unauthorized users from accessing information have so far not
been able to provide security, limiting information rightsholders’ abilities to
enforce their exclusive rights. In any case, the characteristics that make infor-
mation similar to a public good encourage economies of scale, with a small
number of providers becoming the overwhelming source of a given product.
Policy implications related to the economics of information are signifi-
cant. Taxation is one example: Should ecommerce be taxed, and if so, how
(Goolsbee, 2000)? More profound economic considerations include the ram-
ifications of information technologies on IP law, both domestically and inter-
nationally. The regulation of infrastructure such as the broadcast spectrum
and the use of standards are also policy issues with significant economic
impact. These issues will be addressed more fully later in this chapter. A
mundane but significant issue that demonstrated the importance of govern-
ment in the development of information policy is the passage of the Uniform
Electronic Transactions Act, which described and affirmed the legal author-
ity of electronic signatures and the validity of electronic contracts when one
is engaging in ecommerce.
• Short
ineffective; others have noted the law’s potential chilling effect on research
on encryption and rights technologies by anyone other than major industrial
shareholders. It is certainly true that the DMCA has not prevented removal of
DRM technologies and distribution of content, particularly outside the U.S.
It is also likely that the prohibition of public research on these technologies
has weakened DRM in general because proprietary schemes are rapidly bro-
ken and workarounds developed to defeat technologies that the entertain-
ment industry developed at considerable time and expense (Reuters, 2002).
Further, the continued use of DRM has resulted in some public embarrass-
ment for the entertainment industry, as in the case of the Rootkits installed
by DRM technology included on some Sony products (Doctorow, 2005).
meaning (Devil’s Rope Museum, 2007). Other marks may have been applied
by tradesmen to sign work; this way of establishing that a product had been
manufactured by a member of a guild provided assurance that it conformed
to established standards of quality. In the U.S., the 1947 Lanham Act was
passed to remedy perceived shortcomings in previous laws, greatly expand-
ing eligibility and protections for trademarks.
Copyright
Copyright is legal protection for works of authorship, covering everything
from books to choreography to architecture. Copyright protection arises auto-
matically when an original work of authorship is fixed in a tangible means of
expression. The work can incorporate pre-existing material and still be origi-
nal. When pre-existing material is incorporated, the copyright on the new
work covers only the original material contributed by the author. (The U.S.
Copyright Law is available online at www.copyright.gov/title17.) Copyright
law exists to remedy a perceived market failure concerning products of the
mind: that intellectual works are non-rivalrous (consumption by one individ-
ual does not preclude consumption by another, as opposed to tangible goods
such as apples) and difficult to exclude (it is difficult to exclude the public
from access to the property). These conditions reduce the cost of the product
to the consumer, possibly to the point that the creator of the good would not
be rewarded for his or her efforts. To ensure that a creator is fairly compen-
sated, a legal system of rights has been created regarding the distribution,
reproduction, display, and sale of the product. The copyright holder also has
rights regarding the development of derivative works such as screenplays or
translations. A copyright owner has five exclusionary rights: reproduction,
modification, distribution, public performance, and public display.
The U.S. Copyright Act of 1976 allows copyright to be extended to any
work of authorship, including literary works; musical works; dramatic works;
pantomime and choreography; pictorial, graphic, and sculptural works;
motion pictures and other audiovisual works; sound recordings; and archi-
tectural works. Copyright is extended to any work as soon as it is “fixed in any
tangible medium of expression,” in the words of the Act and (unlike patents
and trademarks) no formal registration is required—although registration
has advantages. Copyright covers expressions of ideas, not ideas themselves.
Copyright now extends for 70 years past the lifetime of the creator, 96 years
for corporate works for hire. These long terms are recent extensions; earlier
versions of U.S. copyright law had more limited terms of protection.
Infringement of copyright is a long-standing issue; tales of the outrage of
Robert Louis Stevenson and Mark Twain because of rampant international
piracy are frequently cited. Infringement of copyright generally involves the
unauthorized reproduction and distribution of copies of a work for which the
196 Introduction to Information Science and Technology
videocassette recorder; has lost the fight; and has then benefitted because
the technology proliferated, enabled more consumers to access creative
works, and increased sales.
Some scholars have suggested that the contemporary IP environment is
no longer appropriate for traditional copyright protections. The negligible
cost and ease of distribution of electronic documents has dramatically
reduced the value of copies, although not the cost of producing the original.
Rightsholders face a future in which they must find new ways of generating
revenue from their IP or re-evaluate the economic incentive underlying the
creative impulse. Alternatives to traditional copyright have been suggested,
including the collective commons. Under the collective commons, rights-
holders may choose which of their rights they wish to enforce. For example,
an author may decide to distribute the product freely, with the proviso that
any copies give appropriate attribution. Another alternative to traditional
copyright is found in open source. Traditionally associated with the Linux
operating system, open source allows users to modify and distribute code
and other IP. Because open source software is maintained by a community of
users, bugs and security flaws tend to be identified and dealt with more
quickly than those found in proprietary systems (where the knowledge of a
security vulnerability may go unremedied for some time). The disadvantage
of both schemes is that the potential for profit is much reduced; however,
broader distribution may lead to greater overall sales and may consequently
prove to be a feasible economic model. In spite of industry criticisms, open
source software accounts for a significant portion of the server community,
numerous governments have adopted it for their computer systems, and the
percentage of users relying on open source software on their home comput-
ers rises annually. Regardless of their profitability, it has become evident that
alternatives to traditional copyright schemes are becoming more attractive to
users at every level.
Patents
A patent is a monopoly granted to a rightsholder in exchange for disclosure
of the design of a novel, useful, and nonobvious innovation. The patent is
granted for a limited time, typically 20 years past the filing date. The right
conferred by the patent grant is, in the language of the statute (Contents and
term of patent; provisional rights) and of the grant itself, “the right to
exclude others from making, using, offering for sale, or selling” the inven-
tion in the U.S. or “importing” the invention into the country. What is
granted is not the right to make, use, offer for sale, sell, or import, but in the
language of the statute, “the right to exclude others from making, using,
offering for sale, or selling the invention.” Enforcement of these rights is the
responsibility of the patentholder; the U.S. Patent and Trademark Office
198 Introduction to Information Science and Technology
grants but does not enforce patentholders’ rights. Patent applications are
available from the office’s website, which also offers public search capabili-
ties; novelty is a condition of patentability and would-be registrants may
search the USPTO databases for “prior art.”
There are three types of patents: utility, design, and plant.
U.S. patent law also allows patents for synthetic genes, cell lines, organ-
isms, software, and algorithms. Patents for software are controversial
because all software is ultimately based on mathematical processes, and
mathematics and mathematical equations are not patentable under the law.
Some scholars justify the non-patentability of math on the basis that natural
phenomena and abstract ideas cannot be patented, and others say that
mathematics is not a process, article, or composition of matter. Software
patent critics also maintain that computer software design builds on previ-
ous design and that the patenting of algorithms and processes has a detri-
mental effect on innovation. Some critics argue that the patent review
process is not sufficiently robust to address the suitability of these products
for protection and that as a result, many frivolous or inappropriate protec-
tions have been granted. Countless legal battles have been fought over soft-
ware patents, and numerous frivolous patents have been found by courts to
be invalid (Electronic Frontier Foundation, 2009). This area of law is evolving
with the technologies and products it treats; policy and regulatory issues sur-
rounding these topics are likely to persist and to have broad implications for
years to come.
Trademarks
Trademarks are words, names, symbols, or devices used by owners to identify
their goods and services as distinct from those of others. Similar to trade-
marks, yet distinct from them, are trade secrets, defined as “information,
including a formula, pattern, compilation, program device, method, tech-
nique, or process, that ... derives independent economic value ... from not
being generally known ... and ... is the subject of efforts that are reasonable
Information Policy 199
practice of ensuring that products, services, systems, and materials meet the
specifications of appropriate standards before they reach the markets. As the
organization indicates on its website (www.iso.org/iso/casco_2005.pdf ),
“ISO guides and standards for conformity assessment represent an interna-
tional consensus on best practice. Their use contributes to the consistency of
conformity assessment worldwide and so facilitates trade” (p. 4). Standards
allow for better market penetration and enhanced competition. Such stan-
dards can be controversial, as in the case of the Microsoft OOXML standard,
which some industry players such as IBM (2008) deem unacceptable.
Global Standards
The ISO is a nongovernmental organization intended to form consensus
standards among various stakeholders, including industry, individuals, and
governments. With 157 subscriber nations, ISO is the largest developer of
standards and norms for a variety of areas (it has published more than 17,000
standards), ranging from traditional activities such as agriculture and con-
struction through mechanical engineering, manufacturing and distribution,
transport, medical devices, and information and communication technolo-
gies (International Organization for Standardization, 2009a).
ISO develops standards for products and ideas at work in the market-
place by consulting with experts in industry, business, and research to
develop consensus. Such standards tend to evolve with technology and are
subjected to periodic review (typically every 5 years) in order to stay current
and relevant.
not set standards itself but accredits consensus proposed by others. One
example of a standard panel administered by ANSI is the Healthcare
Information Technology Standards Panel (www.hitsp.org), a body whose pur-
pose is to produce a set of standards to enable interoperability among
healthcare software applications. Topics addressed by this panel include
interoperability between electronic health record and laboratory systems,
emergency health records, and medication management interoperability
specifications.
As articulated in its mission statement, “NISO fosters the development
and maintenance of standards that facilitate the creation, persistent man-
agement, and effective interchange of information so that it can be trusted
for use in research and learning” (National Information Standards
Organization, 2010). NISO produces white papers, recommended practices,
technical reports, and publications intended to promulgate standards and
consensus for a wide variety of undertakings and disciplines in the fields of
information management. Developing standards involves peer review and
public commentary, which is open to any NISO voting member. Once
approved, such standards become American National Standards. One exem-
plar of the work of this body can be found in ANSI/NISO standard Z39.85,
which defines descriptors (known as metadata) for information across disci-
plines. This standard identifies 15 core elements (title, creator, subject,
description, date, format, source, and other characteristics that describe the
nature and form of the document, rather than its content); these are used to
define and describe documents in such a way that users and systems from
disparate fields may use them with ease.
The standards and practices regarding information storage and access
shape society in profound and subtle ways; the importance of policy decisions
regarding information cannot be overestimated. The U.S. government has
many branches and institutions that define information standards; one such
group is the National Institute of Standards and Technology (NIST), a branch
of the Department of Commerce; its purpose is to define measurements and
technology standards to increase U.S. competitiveness. NIST is also responsi-
ble for the publication of federal information processing standards codes,
which are used as standards for legal and statistical definition of people and
places. Such information policies dictate Congressional districting, resource
allocation, and taxation policies. Another branch of the NIST is the National
Vulnerability Database (nvd.nist.gov), the repository of Security Content
Automation Protocol, the standards that are used to list software vulnerabilities
and determine their impact. Such assessment tools are intended to develop
and enhance a secure and robust national information infrastructure.
Information policy is sometimes defined de jure, as in the case of the standards
for handling healthcare records. The 1996 HIPAA legislation sets strict (some
would say onerous) limitations on the use and dissemination of personal
202 Introduction to Information Science and Technology
• The Swedish website The Pirate Bay is “one of the world’s largest
facilitators of illegal downloading” (Sarno, 2007); it indexes and
tracks BitTorrent (peer-to-peer file-sharing protocol) files. Access
has been blocked from several European countries and through
Facebook.
References
American Library Association. (2010). About the ALA. Retrieved November 11, 2010, from
www.ala.org/ala/aboutala/offices/pio/mediarelationsa/factsheets/aboutala.cfm.
Braman, S. (2006). The micro- and macroeconomics of information. Annual Review of
Information Science and Technology, 40, 3–52.
Castells, M. (1996). The rise of the network society. Walden, MA: Blackwell.
Devil’s Rope Museum. (2007). Typical brand designs. Retrieved November 11, 2010, from
www.barbwiremuseum.com/Typical_Brand_Designs.htm.
Doctorow, C. (2005, November 12). Sony anti-customer technology roundup and time-line.
BoingBoing.net. Retrieved November 11, 2010, from www.boingboing.net/2005/11/14/
sony-anticustomer-te.html.
Drummond, D. (2010, March 22). A new approach to China: An update. The Official Google
Blog. Retrieved November 11, 2010, from googleblog.blogspot.com/2010/03/new-
approach-to-china-update.html.
Electronic Frontier Foundation. (2008). Bank Julius Baer & Co v. Wikileaks. Electronic Frontier
Foundation. Retrieved November 11, 2010, from www.eff.org/cases/bank-julius-baer-
co-v-wikileaks.
Electronic Frontier Foundation. (2009). Intellectual property. Retrieved November 11, 2010,
from www.eff.org/issues/intellectual-property.
Information Policy 205
Goolsbee, R. (2000). In a world without borders: The impact of taxes on internet commerce.
Quarterly Journal of Economics, 115(2), 561–576.
IBM. (2008). IBM announces new I.T. standards policy. Retrieved August 4, 2009, from www-
03.ibm.com/press/uk/en/pressrelease/25186.wss.
International Organization for Standardization. (2009a). The scope of ISO’s work. Retrieved
November 11, 2010, from www.iso.org/iso/about/discover-iso_the-scope-of-isos-work.htm.
International Organization for Standardization. (2009b). Why standards matter. Retrieved
November 11, 2010, from www.iso.org/iso/about/discover-iso_why-standards-matter.htm.
Lessig, L. (2002). The future of ideas: The fate of the commons in a connected world. New York:
Vintage.
MacKinnon, R. (2008). Flatter world and thicker walls? Blogs, censorship and civic discourse
in China. Public Choice, 134(1–2), 31–46.
McHugh, J. (2003, January). Google vs. evil. Wired, 11.01. Retrieved November 11, 2010, from
www.wired.com/wired/archive/11.01/google_pr.html.
National Conference of Commissioners on Uniform State Laws. (1985). Uniform trade secrets
act. Retrieved November 11, 2010, from euro.ecom.cmu.edu/program/law/08-732/Trade
Secrets/utsa.pdf.
National Information Standards Organization. (2010). About NISO. Retrieved November 11,
2010, from www.niso.org/about.
Orna, E. (1999). Practical information policies (2nd ed.). Aldershot, UK: Gower.
Poltorak, A., & Lerner, P. (2011). Essentials of intellectual property (2nd ed.). New York: Wiley.
Reuters. (2002, May 20). CD crack: Magic marker indeed. Wired. Retrieved November 11,
2010, from www.wired.com/science/discoveries/news/2002/05/52665.
Rubin, R. (2004). Foundations of library and information science (2nd ed.). New York: Neal-
Schuman Publishers.
Samuelson, P. (2000). Five challenges for regulating the global information society. Retrieved
November 11, 2010, from people.ischool.berkeley.edu/~pam/papers/5challenges_feb22
_v2_final_.pdf.
Sarno, D. (2007, April 29). The internet sure loves its outlaws. Los Angeles Times. Retrieved
November 11, 2010, from www.latimes.com/entertainment/news/la-ca-webscout29apr
29,0,1261622.story?coll=la-home-entertainment.
Shapiro, C., & Varian, H. R. (1997, July 30). US government information policy. Retrieved
November 11, 2010, from people.ischool.berkeley.edu/~hal/Papers/policy/policy.html.
Sreberny, A., & Khiabany, G. (2008). Internet in Iran: The battle over an emerging public
sphere. In M. Mclelland & G. Goggin (Eds.), Internationalizing internet studies: Beyond
Anglophone paradigms. New York: Routledge.
Stigler G. J. (1961). The economics of information. Journal of Political Economy, 69(3),
213–225.
United Nations. (1948). Universal declaration of human rights. Retrieved November 11, 2010,
from www.un.org/en/documents/udhr.
CHAPTER 13
13.1. Introduction
This chapter considers the institutional roots and professional expectations of
those who choose careers in information science and technology. Historically,
libraries, archives, and museums have encountered and in some ways resolved
many of the issues that face today’s information professional. A review of the
roles these “memory institutions” perform and how they interact permits con-
sideration of their various professional responsibilities and challenges. We
consider the values of information professionals and how these values are
demonstrated in their sense of mission. The chapter concludes with a discus-
sion of the complexities of ethics in the information professions.
207
208 Introduction to Information Science and Technology
Libraries
Library (from the Latin word liber, for book) can mean a collection of infor-
mation resources and associated services or the building in which these
things are housed. Modern libraries’ collections include books as well as
many other materials: maps, sound and video recordings, databases, and
other electronic resources. Librarians determine what to add to the collec-
tions and how to catalog resources so that they are accessible; librarians are
also trained to help people identify the kinds of information they need and to
assist in locating the best sources for the information required.
Public libraries are usually supported by taxes and open to the general pub-
lic. This broad mandate to develop collections and provide services for all—
“regardless of age, education, ethnicity, language, income, physical limitations
or geographic barriers”—reflects librarians’ “core values of the library commu-
nity such as equal access to information, intellectual freedom, and the objec-
tive stewardship and provision of information,” to quote the American Library
Association (2010).
Some public libraries and many academic libraries have developed and
continue to maintain large and historic collections that support scholars and
researchers. Their resources and services often go into greater depth than
those of most public libraries; many collect primary source material (such as
manuscripts or historical documents), as well as published works (such as
books, periodicals, and their electronic counterparts). These libraries are
part of the cycle of scholarly communication: Scholars and researchers con-
tribute new ideas, which are made available via publications; publications
are acquired by libraries; and scholars and researchers use library resources
to develop new ideas (see Chapter 11).
Special libraries support corporations, government agencies, specialized
academic units, or other organizations with in-depth collections and serv-
ices. Special librarians often work closely with researchers in their employing
organization, such as banks, medical schools, news, or the pharmaceutical
industry. Users of these libraries also contribute to scholarly communication.
School libraries, sometimes called media centers, provide materials and
services for students and teachers in elementary and secondary schools. A
school library aims to support and enhance the curricular goals of the school.
All libraries are major sources of information for their clients; public and
research libraries often have commitments to society at large, as well. As
guardians of the public’s access to information, their roles are changing rap-
idly with the revolutionary developments of the digital world. Libraries are
becoming virtual, in that information technologies allow people to reach the
information they need from almost any place.
The Information Professions 209
Museums
The International Council of Museums (2006) defines a museum as “a non-
profit making permanent institution in the service of society and of its devel-
opment, open to the public, which acquires, conserves, researches,
communicates and exhibits, for purposes of study, education and enjoy-
ment, the tangible and intangible evidence of people and their environment”
(International Council of Museums, under “Glossary”).
Museums provide opportunities for lifelong learning and are stewards of
our cultural heritage. They engage with schools, families, and communities,
connecting the whole of society to the cultural, artistic, historical, natural,
and scientific understandings that constitute our heritage. Museums collect
and conserve tangible objects—animate and inanimate—for the benefit of
future generations (Institute of Museum and Library Services, 2008).
Examples include both governmental and private museums of anthropology,
art history and natural history, aquariums, arboreta, art centers, botanical
gardens, children’s museums, historic sites, nature centers, planetariums,
science and technology centers, and zoos (American Association of
Museums, 2009).
Because of the uniqueness of the items they collect, museums have spe-
cial responsibilities to work closely with the communities from which their
collections originate, as well as the communities they serve. Museums are
held to a standard of stewardship that includes respect for rightful owner-
ship, permanence, documentation, accessibility, and responsible disposal of
items collected (International Council of Museums, 2006).
Archives
Archives are the records a person or organization creates or receives and pre-
serves because of their enduring value (Pearce-Moses, 2005). For a corpora-
tion or a state or nation, these include administrative files, business records,
memos, official correspondence, meeting minutes—sometimes referred to
as the by-products of the organization’s activities. The material in an archive
may be written text on paper, photographs, sound recordings, electronic
records, or other formats. In an archive these permanent records are main-
tained using principles of provenance (keeping separate records from differ-
ent sources, to preserve their context), original order, and collective control.
Archives constitute the memory of nations and of societies, shape their
identity, and are a cornerstone of the information society. By providing evi-
dence of human actions and transactions, archives support administration
and underlie the rights of individuals, organizations, and states. By guaran-
teeing citizens’ rights of access to official information and to knowledge of
their history, archives are fundamental to democracy, accountability, and
good governance (International Council on Archives, 2008).
210 Introduction to Information Science and Technology
13.3. Values
Freedom, equal access, and neutrality are fundamental, but contested, val-
ues in information science; they are the topics of many of today’s great
debates. For instance, people’s “right-to-know” has been supported by both
hackers and the American Library Association. Some would argue that pri-
vate matters must be protected from public disclosure, and corporations
would affirm that no one has either a right to know or a right to access pro-
prietary information unless it was purchased on the open market.
212 Introduction to Information Science and Technology
Intellectual Freedom
The right to intellectual freedom is stated in Article 19 of the Universal
Declaration of Human Rights: “Everyone has the right to freedom of opinion
and expression; this right includes freedom to hold opinions without inter-
ference and to seek, receive and impart information and ideas through any
media and regardless of frontiers” (United Nations, 1948).
Several professions, including education and librarianship, promote the
safeguarding of this right. For instance, the International Federation of
Library Associations and Institutions (1999) states that “the right to know is a
requirement for freedom of thought and conscience; freedom of thought and
freedom of expression are necessary conditions for freedom of access to
information” (paragraph 3). And the Canadian Library Association (1985)
holds that everyone in that country has “the fundamental right … to have
access to all expressions of knowledge, creativity, and intellectual activity,
and to express their thoughts publicly” (paragraph 1).
European and North American writers debate how traditional intellectual
freedoms will be identified and preserved in the digital age: digital liberties
have been described that include access to technology, free exchange of
ideas, the right to privacy, culture sharing, knowledge and skill development,
and emancipation through empowerment.
Some of the values supporting intellectual freedom may be culturally
influenced and sometimes at odds with some sensitive and proprietary
indigenous knowledge. This is evident in anthropological research: Newly
The Information Professions 213
empowered native peoples have found their own voices and claimed the
right of repatriation of artifacts, together with the knowledge associated with
their religious rituals, in order to regain full ownership of their mysteries.
The concept of intellectual freedom has also expanded to software devel-
opment, access, and use. For instance, advocates of free and open source
software contend that computer users have the right to replace proprietary
software (used under restrictive licensing terms and conditions) with free
software (considered a superior model for software development). This
approach to social, ethical, and technical issues has resulted in efforts to pass
legislation encouraging use of free software by government agencies in vari-
ous countries, including the U.S.
• Censorship
Responsibility to Society
To improve the information systems with which they work or
which they represent, to the best of their means and abilities by
Some ethical issues related to the use and abuse of information date from
antiquity, such as the persistent issues of censorship and identity theft.
Technological developments have exacerbated some concerns, but the fun-
damental issues are constant. Other issues, such as data mining and the use
of technologies such as radio frequency identification and global positioning
system tracking have raised new questions about the nature of privacy and
anonymity. In many cases, modern ethical questions are not new but have
become more pressing and relevant because of fundamental technological
trends. Technological factors that contribute to or exacerbate problems in
the information society include the exponential growth of processing power
and storage capacity of information handling technology (computers and
network devices), increased incidental data production by individuals (daily
living now results in a much more detailed data trail than it did 25 years ago),
and increasingly sophisticated and wide-reaching data analysis tools.
Because using today’s information and communication technologies can
provide significant economic value, some entities are highly motivated to
refine their techniques in order to increase profit by selling ever more
detailed information about individuals. Such data aggregators and brokers
are controversial; however, many of their sources of information have always
been publicly available and have simply become more convenient with the
advent of widespread information networks. For example, a search of home
sales and tax records once required a trip to the local or state records office;
today a much larger search can be done in less time from any device with
access to the appropriate data repositories. The practice has not changed,
but its speed and convenience have.
Any discussion of information ethics must first consider what constitutes
ethical behavior. Ethics comprises the precepts that free individuals use in
making choices to guide their decisions and behaviors. Without undertaking
a study of the history of ethics, it is useful to consider the most commonly
used ethical principle in Western civilization: Immanuel Kant’s categorical
imperative, a moral test that asks the question: If everyone behaved in this
way, could the organization or society in general survive? It can also be seen
as a restatement of the Golden Rule, “Do unto others as you would have them
do unto you.”
Smith (1997) noted two dominant philosophical approaches to informa-
tion ethics: utilitarian (largely based on the work of John Stuart Mill) and
The Information Professions 217
• The slippery slope: This ethical test asks whether an act’s rightness
or wrongness is defined by its magnitude. (If I make an analog
copy of a recording for a child, is that different from distributing
the recording on the internet? Alternatively, is it more moral to take
a $1 bribe than a $10,000 bribe?)
References
American Association of Museums. (2009). What is a museum? Retrieved November 11,
2010, from www.aam-us.org/aboutmuseums/whatis.cfm.
American Library Association. (2010). Access to information. Retrieved November 11, 2010,
from www.ala.org/ala/issuesadvocacy/access/accesstoinformation/index.cfm.
American Society for Information Science and Technology. (1992). ASIST professional guide-
lines. Retrieved December 21, 2010, from http://www.asis.org/professionalguide
lines.html.
Chen, W., & Wellman, B. (2003) Charting and bridging digital divides: Comparing socio-
economic, gender, life stage and rural-urban internet access and use in eight countries.
Sunnyvale, CA: AMD Global Consumer Advisory Board. Retrieved November 11, 2010,
from www.finextra.com/Finextra-downloads/featuredocs/International_digital_divide.
pdf.
Dijkstra, E. W. (1972, August). Speech accepting the ACM Turing Award. Quotation retrieved
April 24, 2011, from www.quotes.net/quote/12595.
Froehlich, T. (1997). Survey and analysis of legal and ethical issues for library and information
services. UNESCO Report (Contract No. 401.723.4) for the International Federation of
Library Associations. IFLA Professional Series. Munich, Germany: G. K. Saur.
Institute of Museum and Library Services. (2008, December). Exhibiting public value:
Government funding for museums in the United States. Retrieved November 11, 2010,
from www.imls.gov/pdf/MuseumPublicFinance.pdf.
International Council of Museums. (2006). Code of ethics. Retrieved November 11, 2010, from
www.icom.museum/ethics.html.
International Council on Archives. (1996). Code of ethics. Retrieved November 11, 2010, from
www.ica.org/sites/default/files/Ethics-EN.pdf.
International Council on Archives. (2008). Welcome to ICA. Retrieved May 11, 2011, from
www.ica.org/102/about-ica/an-introduction-to-our-organization.html.
International Federation of Library Associations and Institutions. (1999). IFLA statement on
libraries and intellectual freedom. Retrieved November 11, 2010, from www.ifla.org/en/
publications/ifla-statement-on-libraries-and-intellectual-freedom.
Laudon, K. C., & Laudon, J. P. (2007). Management information systems: Managing the digi-
tal firm (10th ed.). Upper Saddle River, NJ: Pearson.
Marty, P. F. (1999). Museum informatics and collaborative technologies: The emerging socio-
technological dimension of information science in museum environments. Journal of
the American Society for Information Science, 50(12), 1083–1091.
Mason, R. O., Mason, F. M., & Culnan, M. J. (1995). Ethics of information management.
Thousand Oaks, CA: Sage.
McFarland & Company, Inc. (2009). JIE call for submissions. Retrieved November 11, 2010,
from www.mcfarlandpub.com/jiesubmissions.html.
Mitcham, C., & Huning, A. (1986). Philosophy and technology II: Information technology and
computers in theory and practice. Dordrecht, Netherlands: Reidel.
Pearce-Moses, R. (2005). A glossary of archival and records terminology. Chicago: Society of
American Archivists. Retrieved November 11, 2010, from www.archivists.org/glossary/
term_details.asp?DefinitionKey=156.
Pearce-Moses, R. (2006, March/April). Identity and diversity: What is an archivist? Archival
Outlook. Retrieved November 11, 2010, from www.archivists.org/periodicals/ao_back
issues/AO-Mar06.pdf.
Smith, M. M. (1997). Information ethics. Annual Review of Information Science and
Technology, 32, 339–366.
Spinello, R. A. (1995). Ethical aspects of information technology. Englewood Cliffs, NJ:
Prentice Hall.
United Nations. (1948). Universal declaration of human rights. Retrieved November 11, 2010,
from www.un.org/en/documents/udhr.
CHAPTER 14
Information Theory
14.1. Introduction
“Theory is not dry abstraction but the body of concerns, methods, and
research problems a discipline develops over time. [It provides] not only
intellectual content, but exposure to conventions governing choice of
research problems, methods, materials, and equipment to use.” This is how
Pierce (1992, p. 641) presented the case for theory; she was writing as a soci-
ologist who had taken the requisite introduction to theory course (which the
students called “Dead Germans”) in that field and lamented the absence of
theory in information science.
One might have thought that, for so important a field, a general theory
would be easy to identify. Although much has been written on various
information systems (online databases, libraries, etc.), the attention is
often limited to one type of system, restricted by technology (usually to
computer-based systems), or focused on one function (such as retrieval)
and disregards the broader context. Writings specifically on theory have
often focused on logic, probability, and physical signals. In addition, the field
has only gradually recognized that the word information is used by different
people to denote different things.
In the 1950s information theory was developed from the statistical theory
of communication. As information science developed, researchers and prac-
titioners have also considered more socially oriented theories such as net-
work theory and social epistemology. Philosophers have also taken an
interest in the provocative challenges of our field.
223
224 Introduction to Information Science and Technology
• The destination is the person (or thing) for which the message is
intended.
percent of actors; and web analysts report that 80 percent of web links are
directed to 20 percent of webpages. In libraries, 80 percent of circulation is
attributable to 20 percent of library holdings. The 80/20 rule is known as
Pareto’s law (named for Italian economist Vilfredo Pareto).
The 80/20 rule displays a property key to understanding complex, scale-
free networks: exponential distribution. The continuous decreasing curve
described by the rule typifies the distribution of real-world networks such as
the internet, the neural networks of the brain, and the human genome. Take
the web, for example; a bell curve distribution would suggest most webpages
would be equally popular. However, we find instead that relatively few pages
are popular and most are not. This indicates an exponential distribution in
which many small events coexist with a few large ones.
The exponential distribution is a mathematical indication of the intercon-
nected universe and suggests that complex networks within the universe are
not entirely random. In other words, there is a degree of order in the universe.
strong contacts, whereas 84 percent found their jobs through contacts they
saw occasionally or rarely.
and Ball (2007) claim that it is the basis for the Bliss Bibliographic
Classification:
Epistemology
Epistemology, the theory of knowledge, is important for information science,
as it is in any science or research field. It is closely connected to the different
approaches or paradigms; epistemological views are essentially built into the
research methods accepted by the field. In information science the impor-
tance is doubled: epistemology underlies the approaches used to study infor-
mation, and it concerns views of this information itself. Because knowledge
and information are often used interchangeably in information science, it is
obvious that the theory of knowledge must also be important for the theory
of information.
Epistemologies may be characterized by the kind of information that is
found relevant. Hjørland (2002) outlined relevance criteria in four basic epis-
temological theories (see Table 14.1).
Each of these epistemological positions has strong arguments against the
others. The classical rationalist argument against empiricism is that observa-
tions cannot play the sole role (or even the major role) in acquiring knowl-
edge because one cannot experience anything that is not already anticipated
in the inborn capacity to sense and form concepts. Our knowledge about col-
ors, for example, cannot come from experience alone because the ability to
discriminate colors is a prerequisite to experiencing them.
The inherent weakness in the epistemological positions may lead to skep-
ticism or methodological anarchism. Common sense shows, however, that
science is successful in producing knowledge. Thus it is possible to produce
valuable knowledge, and some principles and methods simply are better
than others in describing how this is done. This consideration may contain
an argument for a pragmatic philosophy.
The epistemological positions outlined here are ideal types: they cannot
exist in pure form, but different persons or documents may be more or less
influenced by one or another of the views. Different views of knowledge
234 Introduction to Information Science and Technology
14.6. Conclusion
As an interdisciplinary field, information science continues to draw on theo-
retical insights from many sources. The persistence and continuing utility of
both mathematical models and social perspectives demonstrate the variety
of challenges in understanding and finding coherent solutions for the real-
world problems that information scientists encounter. The emerging interest
in how philosophy can address these problems provides yet another source
of insight.
References
Allen, B. (1996). Information tasks: Toward a user-centered approach to information systems.
San Diego, CA: Academic Press.
Barabási, A. L. (2002). Linked: The new science of networks. Cambridge, MA: Perseus.
Brookes, B. C. (1980). The foundations of information science, Part 1: Philosophical aspects.
Journal of Information Science, 2(3/4), 125–133.
Capurro, R. (1986). Hermeneutik der Fachinformation. Munich, Germany: Karl Alber.
Cawkell, A. E. (1990). The boundaries of information science: Information theory is alive and
well. Journal of Information Science, 16(4), 215–216.
Cronin, B. (2008). The sociological turn in information science. Journal of Information
Science, 34(4), 465–475.
Ellis, D. (1996). The dilemma of measurement in information retrieval research. Journal of
the American Society for Information Science, 47(1), 23–36.
Floridi, L. (2002). What is the philosophy of information? Metaphilosophy, 33(1/2), 123–145.
Goldman, A. (2001). History of social epistemology. Stanford encyclopedia of philosophy.
Retrieved November 11, 2010, from plato.stanford.edu/entries/epistemology-social/#1.
Granovetter, M. S. (1973). The strength of weak ties. American Journal of Sociology, 78(6),
1360–1380.
Herold, K. (Ed.) (2004). The philosophy of information. (Special Issue.) Library Trends,
52(373–665).
236 Introduction to Information Science and Technology
239
240 Introduction to Information Science and Technology
background noise. Extraneous signals (or noise) that interfere with sound
transmission or quality and cannot be separated from the desired signal.
data. Facts that result from observations; also signs, symbols, and figures that
usually require context or interpretation for full meaning.
digital divide, global digital divide. An expression of the gap that exists
between people, societies, or nations that have effective access to digital
information and technology and those that do not.
Dublin Core. A metadata element set in the fields of library and information
science that is intended to be used for cross-domain information resource
description. It consists of two levels: simple and qualified. It is named for
Dublin, Ohio, home of OCLC, Inc., and managed by the Dublin Core
Metadata Initiative (DCMI).
keyword index. The most important words from a document, extracted and
placed in an index to represent the document’s content. The terms may be
extracted from any part of the document manually or by computer. See
also KWIC index, KWOC index.
KWIC (Key Word In Context) index. The process of extracting and placing in
an index (usually alphabetical) keyword(s) from a text and retaining some
portion of the context of each term. Most KWIC indexes are compiled
semi-automatically with a computer.
KWOC (Key Word Out of Context) index. Like a KWIC index, a computer
algorithm selects keywords and a portion of the surrounding text to be
246 Introduction to Information Science and Technology
metadata. Literally, data about data. The descriptive information (e.g., title,
author, subjects covered, location as a webpage) about an information
resource in an information system. Metadata schemas exist for different
kinds of information resources or objects, such as libraries, archives,
spreadsheets, geographic information, and images; eXtensible Markup
Language (XML) is frequently used on the web. See also bibliographic con-
trol; Dublin Core; markup languages; surrogate record.
natural language. Words or signs people develop and use for everyday com-
munication, written or oral.
paradigm. An example serving as a pattern. Thomas Kuhn used the term par-
adigm shift to describe a basic change in assumptions, leading to new pat-
terns of scientific thought that produce scientific revolutions.
Pareto’s law (or principle). Named for Vilfredo Pareto, Italian economist, to
describe a general rule of thumb about many skewed empirical distribu-
tions (e.g., 80 percent of a library’s circulation comes from 20 percent of
the collection; 80 percent of sales come from 20 percent of customers);
also known as the 80/20 rule. It is not a scientific law but a pattern.
Semantic Web. Metadata and other technologies are used to describe the
meanings (semantics) and relationships of data on the World Wide Web. It
extends and enhances the human-readable hyperlinks on the web. This
allows computer applications to connect and make use of the data, with
the appearance of “understanding.”
virtual private network (VPN). A secure channel for data transmission cre-
ated through encryption and “tunneling.”
251
252 Introduction to Information Science and Technology
Digital Inclusion
Measuring the Impact of Information and
Community Technology
Edited by Michael Crandall and Karen E. Fisher
Through an examination of efforts by
community technology organizations in
Washington State, Digital Inclusion offers a
model for educating policy makers about the
actual impacts of such efforts, along with
suggestions for practical implementation. The
case studies and analyses presented here will be
of critical interest to community technology centers, libraries,
government service agencies, and any other organization (or funder)
that uses technology to deliver services to the information poor.
200 pp/hardbound/ISBN 978-1-57387-373-4
ASIST Members $47.60 • Nonmembers $59.50
Computerization Movements and
Technology Diffusion
From Mainframes to Ubiquitous Computing
Edited by Margaret S. Elliott and
Kenneth L. Kraemer
“Computerization movement” (CM), as first
articulated by Rob Kling, refers to a special
kind of social and technological movement
that promotes the adoption of computing
within organizations and society. Here, editors
Margaret S. Elliott and Kenneth L. Kraemer
and more than two dozen noted scholars trace the successes and
failures of CMs from the mainframe and PC eras to the emerging era
of ubiquitous computing. The empirical studies presented here show
the need for designers, users, and the media to be aware that CM
rhetoric can propose grand visions that never become part of a reality
and reinforce the need for critical and scholarly review of promising
new technologies.
608 pp/hardbound/ISBN 978-1-57387-311-6
ASIST Members $47.60 • Nonmembers $59.50