@XXX - Lanl.gov: First Steps Toward Electronic Research Communication
@XXX - Lanl.gov: First Steps Toward Electronic Research Communication
gov
first steps toward electronic research communication
Paul H. Ginsparg
h
[email protected] is the e-mail short period the primary means of com- The archiving software has been ex-
address for the first of a series of municating ongoing research informa- panded to serve a number of other re-
automated archives for electronic tion in formal areas of high energy par- search disciplines (see Figure 1). The
communication of research information. ticle theory. Its rapid acceptance within extended, automatically maintained
This “e-print archive” went on-line in this community depended critically on database and distribution system cur-
August, 1991. It began as an experi- both recent technological advances and rently serves over 20,000 users from
mental means of circumventing recog- particular behavioral aspects of the more than 60 countries and processes
nized inadequacies of research journals, community. There are now more than over 30,000 messages per day. It is al-
but unexpectedly became within a very 3600 regular users of hep-th worldwide. ready one of the largest and most active
Background
rapid-access gigabyte disk drives cost- Implementation isting e-mail distribution lists in the
ing under $1,000 can hold 25,000 pa- subject of two-dimensional gravity and
pers at an average cost of 4 cents per Having concluded that an electronic conformal field theory. Within six
paper. Slower-access media for preprint archive was possible in princi- months the user base grew to encom-
archival storage cost even less: A digi- ple, I spent a few afternoons during the pass most of the workers in formal
tal audio-tape cartridge, available from summer of 1991 writing the original quantum field theory and string theory,
discount electronics dealers for under software. It was designed as a fully au- and now includes the 3600 subscribers
$15, can hold over 4 gigabytes, that is, tomated system in which users con- mentioned above. Its smooth operation
over 100,000 such papers. The data struct, maintain, and revise a compre- has transformed it into an essential re-
equivalent of multiple years of most hensive database and distribution search tool—many users have reported
journals is often far less than the network without outside supervision or their dependence on receiving multiple
amount many experimentalists handle intervention. The software is rudimen- “fixes” each day. The original hep-th
every day. Moreover, the costs of data tary and allows users with minimal archive now receives roughly 200 new
storage will only continue to decrease. computer literacy to communicate e- submissions per month, responds to
Since storage is so inexpensive, an mail requests to the Internet address more than 700 e-mail requests per day,
archive can be duplicated at several dis- [email protected]. Remote users and transmits more than 1000 copies of
tribution points, minimizing the risk of can submit and replace papers, obtain papers on peak days. Internet e-mail
loss due to accident or catastrophe and papers and listings, get help on avail- access time is typically a few seconds.
facilitating worldwide network access. able commands, search the listings for The system originally ran as a back-
The Internet runs 24 hours a day—with author names, and so on. ground job on a small UNIX worksta-
virtually no interruptions—and transfers The formal communication provided tion (a 25-megahertz NeXTstation with
data at rates of up to 45 megabits per by an “e-print archive” should be dis- a 68040 processor purchased for rough-
second (that is, less than a hundredth of tinguished from the informal (and unar- ly $5,000 in 1991), which was primari-
a second per paper). Projected up- chived) communication provided by ly used for other purposes by another
grades of NSFnet to a few gigabits per electronic bulletin boards and network member of my research group, and
second within a few years should be news. In the case of an e-print archive, placed no noticeable drain on CPU re-
adequate to accommodate increased researchers are restricted to communi- sources. The system has since been
usage for the academic community. cation by means of abstracts and re- moved to an HP 9000/735 that sits ex-
The commercial networks that will con- search papers suitable for publication in iled on the floor under a table in a
stitute the nation’s electronic data high- conventional research journals. Elec- corner.
way will have even greater capacity. tronic bulletin boards are more akin to For those directly on the Internet,
These technological advances—com- ordinary conversation or written corre- the system allows anonymous FTP ac-
bined with a remarkable lack of re- spondence; that is, they are neither in- cess to the papers and listings directo-
sponse to the electronic revolution from dexed for retrieval nor stored indefinite- ries. Now access can be gained
conventional journals—rendered the de- ly. The e-print archives allow a through WorldWideWeb for those with
velopment of e-print archives “an acci- submitter to replace his or her submis- the required (public-domain) client soft-
dent waiting to happen.” Perhaps more sion, and the program automatically ware (see Figure 3). Local menu-dri-
surprising has been the readiness of sci- checks on database integrity to ensure, ven interfaces can be set up to automat-
entific communities to adopt this new for example, that the person replacing a ically pipe selected papers through text
tool of information exchange and to ex- submission is indeed the original sub- formatters directly to a screen preview-
plore its implications for traditional re- mitter. In addition, the system main- er or printer. (Such software has been
view and publication processes. The tains permanent records of submissions set up to cache and redistribute papers
exponential growth in archive usage and the dates they were submitted, and on many local networks.) The World-
suggests that scientific researchers are it records the number of user requests WideWeb interface for the multiple
not only eager—but indeed impatient— for each paper. Subscribers to the sys- archives at xxx.lanl.gov currently
for completion of the proposed “infor- tem receive a daily listing of new titles processes over 5000 requests daily (see
mation superhighways” (though not and abstracts (see Figure 2). Figure 4). While that is only a small
necessarily the tollbooths of “informa- The initial user base for hep-th was fraction of the overall usage of the e-
tion turnpikes”). 160 addresses assembled from pre-ex- print archives, WorldWideWeb is ex-
recognition of the superior service pro- cause retrieving a copy of a paper elec- have eliminated their hard-copy distrib-
vided by e-print archives; and third, by tronically is more convenient than ution of preprints have already seen
the fact that in some of the fields physically retrieving a paper from a file significant savings in time and money;
served by e-print archives, it has be- cabinet. Aside from minimizing geo- others have specifically requested that
come customary to provide a paper’s graphic inequalities by eliminating the hard copy no longer be sent to them,
electronic-archive index number as a “boat-mail gap” between continents, the since electronic distribution has proven
reference rather than a local report system institutes a form of democracy reliable and more efficient. Implement-
number or a published reference. in research wherein access to new re- ing a billing system for the use of an e-
sults is granted equally to everyone print archive would be fairly straight-
from beginning graduate students to forward; however, such archives cost so
Prospects and Concerns seasoned operators. No longer is it cru- little to set up and maintain that they
cial to have the correct connections or can be offered virtually free. Overbur-
The system, in its present form, was to be on exclusive mailing lists to be dened terminal resources at libraries are
not intended to replace journals but kept informed of progress in one’s not an issue, since access is typically
only to organize what was once a hap- field. The pernicious problem of lost or via the terminal or workstation on one’s
hazard and unequal distribution of elec- stolen preprints experienced by some desk or in the nearest computer room.
tronic preprints. It is increasingly used large institutions is also definitively ex- Electronic research archives will
as an electronic journal, however, be- orcised. The many institutions that prove particularly useful for new and
emerging interdisciplinary areas of re- institutions, including teaching hospi- rial, and validating research for the pur-
search for which there are no existing tals, law schools, humanities depart- pose of job and grant allocation. A ref-
print journals and for which informa- ments, and ultimately public libraries ereeing mechanism could be easily im-
tion is consequently difficult to obtain. and public grade schools.) plemented for the e-print archives in
In many such cases, it is advantageous E-print archives will eventually the form of either a filter prior to elec-
to avoid a proliferation of premature or bring great changes to the scientific- tronic distribution or a review after sub-
ill-considered new journals. Cross-link- journal industry as well. Over the past mission by volunteer readers and/or se-
ing of various databases provides an decade publication companies have lected reviewers. In either case, the
immediate virtual meeting ground for been somewhat irresponsible—increas- archives could be partitioned into one
researchers who wouldn’t ordinarily ing the number of journals and as well or more levels of refereed and unrefer-
communicate with one another. Re- the subscription price per journal (some eed sectors. Thus, lifting the artificial
searchers can quickly establish their single journal subscriptions to libraries financial constraints on dissemination of
own dedicated electronic archive when now run well over $10,000 per year) information and decoupling it from the
it is appropriate and ultimately disband during a period when libraries are expe- traditional refereeing process will allow
if things do not pan out—all with far riencing a decrease in both funds and for more innovative methods of identi-
greater ease and flexibility than is pro- space. Publishers have been slow to in- fying and validating significant research.
vided by traditional publication media. corporate electronic communication Problems may arise, however, as
Electronic access to scientific re- into their operation and distribution, al- computer networking spreads outside of
search will be a major boon to develop- though such a move would ultimately the academic community. For example,
ing countries, since the expense of con- result in dramatic savings in cost and hep-th would be somewhat less useful
necting to an existing network is time for all involved. if it were to become inundated by sub-
infinitesimal compared with that of con- Some members of the community missions from “crackpots” promoting
structing, stocking, and maintaining li- have voiced their concern that electron- their perpetual-motion machines. It is
braries. (I frequently receive messages ic distribution will somehow increase clear that the architecture of the infor-
from physicists in developing countries the number of preprints produced, or mation highways of the future will
confirming how much better off they encourage dissemination of preliminary somehow have to reimplement the pro-
find themselves even in the short term or incorrect material. This concern, tective physical and social isolation cur-
with the advent of electronic distribu- however, confuses the method of pro- rently enjoyed by ivory towers and re-
tion systems—they are no longer “out duction with the method of distribu- search laboratories.
of the loop.” Others report feeling that tion—most researchers are already pro- Increased standardization of net-
their own research gets a more equi- ducing at saturation. Moreover, once working software and electronic storage
table reading—their research is no posted to an archive, the electronic formats during the 1990s encourages us
longer dismissed for the superficial rea- form is instantly publicized to thou- to fantasize about other possible en-
sons of low-quality printing or paper sands of people. Thus the embarrass- hancements to scholarly research com-
stock.) Now that much of the technolo- ment over incorrect results is, if any- munication—in particular, discussion
gy has ripened, Eastern European and thing, increased. Such submissions “threads” in which users respond to one
third-world nations may rapidly develop cannot be removed; they can only be another’s comments on a specific topic.
their electronic infrastructures to the replaced by a note that the work has Usenet newsgroups, for reasons such as
level that took developed nations over a been withdrawn as incorrect, leaving a their lack of indexing and archiving and
decade to reach—a level at which data- more permanent blemish than a hard their open nature, are unlikely to prove
transmission lines are as common as copy of limited distribution that is soon adequate for serious purposes. On the
telephone service and terminals and forgotten. other hand, it is now technically simple
laser-printers as common as typewriters The widespread use of e-print to implement a WorldWideWeb form-
and copy machines. (Similar comments archives does not necessarily make ref- based submission system to build hy-
apply equally to the less well-endowed ereed forums obsolete. In some disci- perlinked threads, accessible from given
institutions in the U.S., and the changes plines, the refereeing process plays a points in individual papers and also
experienced by physics and biology de- useful role in improving the quality of started from a subject-based linked dis-
partments are soon to be repeated by published work, filtering out large cussion page. All posted text could be
the full range of conventional academic amounts of irrelevant or incorrect mate- indexed by the WAIS (Wide Area In-
formation Server) scheme for easy re- iting access to emulate that effective in- ality in the form of top-of-the-line
trieval, and related threads could inter- sulation from unwanted incursions af- search, retrieval, and input capabilities
leave and cross-link in a natural man- forded by corridors and seminar rooms for cutting-edge power users, while
ner, with standard methods for moving at universities and research laboratories. maintaining “lowest-common-denomi-
forward and backtracking. A his- One method would be to employ a nator” capabilities for the less “net-
togram-like interface showing the activ- “seed” mechanism—that is, to start work-fortunate.”
ity on each thread would facilitate find- from a given set of “trusted users” and
ing threads of current interest, and the let them authorize others (and effective-
index could allow location of all post- ly be responsible for those beneath Conclusions and Open
ings by a given person (including self) them in the tree), with guidelines such Questions
with the date of latest follow-up to fa- as that the new users must have doctor-
cilitate tracking of responses. This ates or be doctoral candidates, and These systems are still primitive, and
would provide a much more flexible make permission to post/authorize revo- they represent only tentative first steps
format than Usenet, specifically avoid- cable at any time, retroactive one level in the optimal direction. To summa-
ing awkward protocols for group cre- back in the tree. To allow global cov- rize, thus far we have learned.
ation and removal as well as avoiding erage, application to the top level for
potentially unscalable aspects of nntp authorization could be allowed to start c The exponential increase in usage of
(the network news transfer protocol). a new branch. The scheme entails electronic networking over the past
For the relatively circumscribed physics some obvious compromises, and other few years opens new possibilities for
research community, a central database schemes are easily envisioned, but the both formal and informal communi-
(copied onto many nodes, as usual) ultimate object remains to determine cation of research information.
would have no difficulty with storage the optimal level of filtering for input
or access bandwidth. To enable full- access to maintain an auspicious signal- c In some fields of science, electronic
fledged research communication with to-noise ratio for those research com- preprint archives have been on-line
in-line equations or other linkages, we munities that prefer to be buffered from since mid-1991 and have become the
require slightly higher quality browsers the outside world. This would consti- primary means of communicating re-
than are currently available. But with tute an incipient “virtual communica- search information to many thousands
hypertext transfer protocols (http) now tion corridor,” further facilitating useful of researchers within the fields they
relatively standardized, network links research communication in what for- serve. It has been established that
and links to other application software merly constituted both pre- and post- people will voluntarily subscribe to
can be built into underlying TeX docu- publication phases, and rendering ever receive information from these sys-
ments (and configured into standard more irrelevant individual researchers’ tems and will make aggressive use of
macro packages) to be either interpreted physical location. them if they are set up properly. It is
by dedicated TeX previewers or passed Finally, we mention that the e-print anticipated that such systems will
by a suitable driver into more archival archives in their current incarnation al- grow and evolve rapidly in the next
formats (such as Adobe Acrobat PDF) ready serve as surprisingly effective in- few years.
for greater portability across platforms. ducements for computer literacy, and
Multi-component messages could also they have motivated some dramatic c From such experimental systems, we
be assembled in a graphical user inter- changes in computer usage. Re- have learned that open (unrefereed)
face for composing MIME (multipur- searchers who previously disdained distribution of research information
pose internet mail extension) messages computers now confess an addiction to can work well for some disciplines
to be piped to the server by means of e-mail. Many researchers who for and has advantages for researchers in
the http POST protocol, thereby cir- years had refused to switch to UNIX or both developed and developing coun-
cumventing some of the inconvenient to TeX are in the process of converting; tries. We have also learned that the
baggage of Internet sendmail or FTP others have suddenly discovered the technology and network connectivity
protocols. power of browsing with World- are currently adequate to support
While the above is technically WideWeb. The system’s effectiveness such systems, the performance of
straightforward to implement, there re- in motivating these changes justifies the which should benefit from the contin-
mains the aforementioned issue of lim- philosophy of providing dual function- uing improvements in technology.
I conclude with some unanswered they continue to be so magnanimous This began for me as a spare-time
questions to amplify some of my earlier when libraries begin to cancel journal project to test the design and imple-
comments: subscriptions? mentation of an electronic preprint dis-
tribution system for my own relatively
c Who will ultimately be the prime c What storage formats and network small research community. Its feasibil-
beneficiaries of electronic research utilities are best suited for archiving ity had been the subject of contentious
communication (that is, researchers, and retrieving information? Current- dispute, and its realization was
publishers, libraries, or other ly we use a combination of e-mail, thought—even by its proponents—to be
providers of network resources)? anonymous FTP, and window-orient- several years in the future. Its success
ed utilities, such as Gopher and has led to an unexpectedly enormous
c What factors influence research com- WorldWideWeb, combined with growth in usage. It has expanded into
munities in their rate and degree of WAIS indexing to retrieve TeX and other fields of research and has elicited
acceptance of electronic technology, Postscript documents. Will some- interest from many others—I have re-
and what mechanisms are effective in thing even better—for example, Ac- ceived over one hundred inquiries into
facilitating such changes? robat or some other format currently setting up archives for different disci-
under development—soon merge plines. Each discipline will have slight-
c What role will be played by the con- with the above or emerge as a new ly different hardware and software re-
ventional peer-refereeing process in standard? quirements, but the current system can
the electronic media, and how will it be used as a provisionary platform that
differ from field to field? c How will the medium itself evolve? can be tailored to the specific needs of
Conservatively, we can imagine “in- different communities. Despite the suc-
c What role will be played by publish- teractive” journals in which equations cess of this project, for three years it re-
ing companies, and how large will can be manipulated, solved, or mained a spare-time project with little
their profits be? If publication com- graphed; citations can instantly open financial or logistical support. Only
panies do adopt fully electronic dis- references to the relevant page; com- very recently have the Laboratory, cer-
tribution, will they pass along the re- ments and errata dated and keyed to tain government funding agencies, and
duced costs associated with the the relevant text can be inserted as certain professional societies moved to
increased efficiency of production electronic “post-it notes” in the mar- increase their levels of involvement.
and distribution to their subscribers? gins, and so on. Ultimately we will Further development will require co-
Can publishing companies provide have a multiply interconnected net- ordination among interested researchers
more value than an unmanned auto- work hypertext system with transpar- from various disciplines, computer and
mated system whose primary virtue is ent pointers among distributed data- networking staff, and interested library
instant retransmission? bases that transcends the limits of personnel. In particular, it will require
conventional journals in structure, dedicated staffing. At the moment,
c What role will be played by library content, and functionality, thereby hardware and software maintenance of
systems? (Will information be chan- transforming the very nature of what existing automated archives remains a
neled through libraries or, instead, di- is communicated. These are the loosely coordinated volunteer operation,
rectly to researchers?) kinds of benefits for which we should and little further progress can be made
certainly be willing to pay. Certainly on the issues raised by the current sys-
c How will copyright law be applied to we do not wish to clone current jour- tems without some thoughtful direction.
material that exists only in electronic nal formats (determined as they are Perhaps the centralized databases and
form? At the moment publishing by the constraints of the print medi- further software development will ulti-
companies are “looking the other um) in the electronic medium—we mately be administered and system-
way,” living with the dissemination are already capable of distinguishing atized by established publishing institu-
of the electronic preprint information information content from superficial tions—if they are prescient enough to
as they did with the earlier preprinted appearance. Who will decide the reconfigure themselves for the in-
form—claiming that it would be anti- standards required to implement any evitable. Since it has been researchers
thetical to their philosophy to impede such progress? who have taken the lead thus far, how-
dissemination of information. Will ever, we should retain this unique op-
Acknowledgements
Many people have contributed (consciously or
otherwise) to the development of these systems.
The original distribution list from which hep-th
sprung in 1991 was assembled by Joanne Cohn,
whose incipient efforts demonstrated that mem-
bers of this community were eager for electronic
distribution (and Stephen Shenker recommended
that the original archive name not include the
string “string”). Continual improvements have
been based on feedback from users too numerous
to credit (although among the most vocal have Paul H. Ginsparg received his A.B. in physics
been Tanmoy Bhattacharya (T-8), Jacques Dis- from Harvard University in 1977 and his Ph.D.
tler, Marek Karliner, and Paul Mende). People in physics from Cornell University in 1981
who have administered some of the remote-based under the direction of Kenneth G. Wilson. He
archives include Dave Morrison, Bob Edwards, then joined the physics department at Harvard
Roberto Innocente, Erica Jen (T-7), and Bob University as a Junior Fellow, eventually becom-
Parks. Joseph A. Carlson (T-5) and David ing an Associate Professor. In 1990 he came to
Thomas set up the original Gopher interfaces in the Elementary Particles and Field Theory Group
late 1992. The Network Operations Center at in the Laboratory’s Theoretical Division, where
Los Alamos National Laboratory has reliably and he carries out research in relativistic quantum
uncomplainingly supplied the requisite network field theory.
bandwidth @lanl.gov, and Joseph H. Kleczka (C-
8) has been available for crisis control. Louise
Addis and the staff at the SLAC library moved
quickly to incorporate e-print information into
the SPIRES database, furthering their decades of
tireless electronic service to the high-energy
physics community. Dave Forslund (Advanced
Computing Laboratory) and Richard Luce (CIC-
14) helped lobby for support from within the
Laboratory, and the Advanced Computing Labo-
ratory has in addition provided some logistical
and moral support. Finally, Geoffrey B. West
(T-8) repeatedly and against all obvious reason
insisted that the Los Alamos National Laboratory
is an appropriate sponsor for this activity, while
simultaneously bearing the bad news both from
within the Laboratory and from certain govern-
ment funding agencies