Avoiding Technological Quicksand:: Finding A Viable Technical Foundation For Digital Preservation
Avoiding Technological Quicksand:: Finding A Viable Technical Foundation For Digital Preservation
Quicksand:
Finding a Viable Technical Foundation
for Digital Preservation
A Report to
the Council on Library and Information Resources
by Jeff Rothenberg
January 1999
Commission on Preservation and Access
The Commission on Preservation and Access, a program of the Council on Library and Information
Resources, supports the efforts of libraries and archives to save endangered portions of their paper-based
collections and to meet the new preservation challenges of the digital environment. Working with
institutions around the world, CLIR disseminates knowledge of best preservation practices and promotes
a coordinated approach to preservation activity.
Digital Libraries
The Digital Libraries program of the Council on Library and Information Resources is committed to
helping libraries of all types and sizes understand the far-reaching implications of digitization. To that
end, CLIR supports projects and publications whose purpose is to build confidence in, and increase
understanding of, the digital component that libraries are now adding to their traditional print holdings.
ISBN 1-887334-63-7
Published by:
Additional copies are available for $20.00 from the above address. Orders must be prepaid, with checks made
payable to the Council on Library and Information Resources.
The paper in this publication meets the minimum requirements of the American National Standard for
8
Copyright 1999 by the Council on Library and Information Resources. No part of this publication may be reproduced
or transcribed in any form without permission of the publisher. Requests for reproduction for noncommercial
purposes, including educational advancement, private study, or research will be granted. Full credit must be given to
both the author and the Council on Library and Information Resources.
Contents
Preface ..................................................................................................................... iv
1. Introduction ....................................................................................................... 1
References ............................................................................................................... 31
iv
Preface
Executive Summary
T
he vision of creating digital libraries that will be able to pre-
serve our heritage currently rests on technological quicksand.
There is as yet no viable long-term strategy to ensure that
digital information will be readable in the future. Not only are digital
documents vulnerable to loss via media decay and obsolescence, but
they become equally inaccessible and unreadable if the software
needed to interpret them—or the hardware on which that software
runs—is lost or becomes obsolete.
This report explores the technical depth of this problem, analyz-
es the inadequacies of a number of ideas that have been proposed as
solutions, and elaborates the emulation strategy, which is, in my
view, the only approach yet suggested to offer a true solution to the
problem of digital preservation (Rothenberg 1995a). Other proposed
solutions involve printing digital documents on paper, relying on
standards to keep them readable, reading them by running obsolete
software and hardware preserved in museums, or translating them
so that they “migrate” into forms accessible by future generations of
software. Yet all of these approaches are short-sighted, labor-inten-
sive, and ultimately incapable of preserving digital documents in
their original forms. Emulation, on the other hand, promises predict-
able, cost-effective preservation of original documents, by means of
running their original software under emulation on future computers.
2. The Digital Documents, data, records, and informational and cultural artifacts of
all kinds are rapidly being converted to digital form, if they were not
Longevity
created digitally to begin with. This rush to digitize is being driven
Problem by powerful incentives, including the ability to make perfect copies
of digital artifacts, to publish them on a wide range of media, to dis-
tribute and disseminate them over networks, to reformat and convert
them into alternate forms, to locate them, search their contents, and
retrieve them, and to process them with automated and semi-auto-
mated tools. Yet the longevity of digital content is problematic for a
number of complex and interrelated reasons (UNACCIS 1990, Lesk
1995, Morris 1998, Popkin and Cushman 1993, Rothenberg 1995b,
Getty 1998).
2 Avoiding Technological Quicksand
4. The Scope of The preservation of digital documents is a matter of more than pure-
ly academic concern. A 1990 House of Representatives report cited a
the Problem
number of cases of significant digital records that had already been
lost or were in serious jeopardy of being lost (U. S. Congress 1990),
and the 1997 documentary film Into the Future (Sanders 1997) cited
additional cases (Bikson 1994, Bikson and Law 1993, Fonseca, Polles
and Almeida 1996, Manes 1998, NRC 1995).
In its short history, computer science has become inured to the
fact that every new generation of software and hardware technology
entails the loss of information, as documents are translated between
incompatible formats (Lesk 1992). The most serious losses are caused
Avoiding Technological Quicksand 5
cable to preserving digital documents far into the future, and the
techniques employed may not even be generalizable to solve similar
urgent problems that may arise in the future. These short-term ef-
forts therefore do not provide much leverage, in the sense that they
are not replicable for different document types, though they may still
be necessary for saving crucial records. In the medium term, organi-
zations must quickly implement policies and technical procedures to
prevent digital records from becoming vulnerable to imminent loss
in the near future. For the vast bulk of records—those being generat-
ed now, those that have been generated fairly recently, or those that
have been translated into formats and stored on media that are cur-
rently in use—the medium-term issue is how to prevent these
records from becoming urgent cases of imminent loss within the next
few years, as current media, formats, and software evolve and be-
come obsolete.
In the long term (which is the focus of this report), it is necessary
to develop a truly long-lived solution to digital longevity that does
not require continual heroic effort or repeated invention of new ap-
proaches every time formats, software or hardware paradigms, doc-
ument types, or recordkeeping practices change. Such an approach
must be extensible, in recognition of the fact that we cannot predict
future changes, and it must not require labor-intensive (and error-
prone) translation or examination of individual records. It must han-
dle current and future records of unknown type in a uniform way,
while being capable of evolving as necessary.
Though media problems are far from trivial, they are but the tip of
the iceberg. Far more problematic is the fact that digital documents
are in general dependent on application software to make them ac-
cessible and meaningful. Copying media correctly at best ensures
that the original bit stream of a digital document will be preserved.
But a stream of bits cannot be made self-explanatory, any more than
hieroglyphics were self-explanatory for the 1,300 years before the
discovery of the Rosetta Stone. A bit stream (like any stream of sym-
bols) can represent anything: not just text but also data, imagery, au-
dio, video, animated graphics, and any other form or format, current
or future, singly or combined in a hypermedia lattice of pointers
whose formats themselves may be arbitrarily complex and idiosyn-
cratic. Without knowing what is intended, it is impossible to deci-
pher such a stream. In certain restricted cases, it may be possible to
decode the stream without additional knowledge: for example, if a
bit stream is known to represent simple, linear text, deciphering it is
amenable to cryptographic techniques. But in general, a bit stream
can be made intelligible only by running the software that created it,
or some closely related software that understands it.
This point cannot be overstated: in a very real sense, digital doc-
uments exist only by virtue of software that understands how to ac-
cess and display them; they come into existence only by virtue of
running this software.
When all data are recorded as 0s and 1s, there is, essential-
ly, no object that exists outside of the act of retrieval. The
demand for access creates the ‘object,’ that is, the act of
Avoiding Technological Quicksand 9
As this statement implies, the only reliable way (and often the
only possible way) to access the meaning and functionality of a digi-
tal document is to run its original software—either the software that
created it or some closely related software that understands it
(Swade 1998). Yet such application software becomes obsolete just as
fast as do digital storage media and media-accessing software. And
although we can save obsolete software (and the operating system
environment in which it runs) as just another bit stream, running
that software requires specific computer hardware, which itself be-
comes obsolete just as quickly. It is therefore not obvious how we can
use a digital document’s original software to view the document in
the future on some unknown future computer (which, for example,
might use quantum rather than binary states to perform its computa-
tions). This is the crux of the technical problem of preserving digital
documents.
Any technical solution must also be able to cope with issues of cor-
ruption of information, privacy, authentication, validation, and pre-
serving intellectual property rights. This last issue is especially com-
plex for documents that are born digital and therefore have no single
original instance, since traditional notions of copies are inapplicable
to such documents. Finally, any technical solution must be feasible in
terms of the societal and institutional responsibilities and the costs
required to implement it.
6. The Inadequacy Most approaches that have been proposed fall into one of four cate-
gories: (1) reliance on hard copy, (2) reliance on standards, (3) reli-
of Most Proposed
ance on computer museums, or (4) reliance on migration. Though
Approaches some of these may play a role in an ultimate solution, none of them
comes close to providing a solution by itself, nor does their combina-
tion.
The approach that most institutions are adopting (if only by default)
is to expect digital documents to become unreadable or inaccessible
as their original software becomes obsolete and to translate them
into new forms as needed whenever this occurs (Bikson and Frink-
ing 1993, Dollar 1992). This is the traditional migration approach of
computer science. While it may be better than nothing (better than
having no strategy at all or denying that there is a problem), it has
little to recommend it.4
Migration is by no means a new approach: computer scientists,
data administrators and data processing personnel have spent de-
cades performing migration of data, documents, records, and pro-
grams to keep valuable information alive and usable. Though it has
been employed widely (in the absence of any alternative), the nearly
universal experience has been that migration is labor-intensive, time-
consuming, expensive, error-prone, and fraught with the danger of
losing or corrupting information. Migration requires a unique new
solution for each new format or paradigm and each type of docu-
4The migration approach is often linked to the use of standards, but standards
are not intrinsically a part of migration.
14 Avoiding Technological Quicksand
ment that is to be converted into that new form. Since every para-
digm shift entails a new set of problems, there is not necessarily
much to be learned from previous migration efforts, making each
migration cycle just as difficult, expensive, and problematic as the
last. Automatic conversion is rarely possible, and whether conver-
sion is performed automatically, semiautomatically, or by hand, it is
very likely to result in at least some loss or corruption, as documents
are forced to fit into new forms.
As has been proven repeatedly during the short history of com-
puter science, formats, encodings, and software paradigms change
often and in surprising ways. Of the many dynamic aspects of infor-
mation science, document paradigms, computing paradigms, and
software paradigms are among the most volatile, and their evolution
routinely eludes prediction. Relational and object-oriented databases,
spreadsheets, Web-based hypermedia documents, e-mail attach-
ments, and many other paradigms have appeared on the scene with
relatively little warning, at least from the point of view of most com-
puter users. Each new paradigm of this kind requires considerable
conversion of programs, documents, and work styles, whether per-
formed by users themselves or by programmers, data administra-
tors, or data processing personnel.
Even though some new paradigms subsume the ones they re-
place, they often still require a significant conversion effort. For ex-
ample, the spreadsheet paradigm subsumes simple textual tables,
but converting an existing table into a meaningful spreadsheet re-
quires defining the formulas that link the entries in the table, al-
though these relationships are likely to have been merely implicit in
the original textual form (and long since forgotten). Similarly, word
processing subsumes simple text editing, but conversion of a docu-
ment from a simple textual form into a specific word processing for-
mat requires that fonts, paragraph types, indentation, highlighting,
and so forth, be specified, in order to make use of the new medium
and avoid producing a result that would otherwise be unacceptably
old-fashioned, if not illegible.
One of the worst aspects of migration is that it is impossible to
predict what it will entail. Since paradigm shifts cannot be predicted,
they may necessitate arbitrarily complex conversion for some or all
digital documents in a collection. In reality, of course, particularly
complex conversions are unlikely to be affordable in all cases, lead-
ing to the abandonment of individual documents or entire corpora
when conversion would be prohibitively expensive.
In addition, as when refreshing media, there is a degree of ur-
gency involved in migration. If a given document is not converted
when a new paradigm first appears, even if the document is saved in
its original form (and refreshed by being copied onto new media),
the software required to access its now-obsolete form may be lost or
become unusable due to the obsolescence of the required hardware,
making future conversion difficult or impossible. Though this urgen-
cy is driven by the obsolescence of software and hardware, rather
than by the physical decay and obsolescence of the media on which
Avoiding Technological Quicksand 15
7. Criteria for an In contrast to the above strategies, an ideal approach should provide
a single, extensible, long-term solution that can be designed once
Ideal Solution
and for all and applied uniformly, automatically, and in synchrony
(for example, at every future refresh cycle) to all types of documents
and all media, with minimal human intervention. It should provide
maximum leverage, in the sense that implementing it for any docu-
ment type should make it usable for all document types. It should
facilitate document management (cataloging, deaccessioning, and so
forth) by associating human-readable labeling information and meta-
data with each document. It should retain as much as desired (and
feasible) of the original functionality, look, and feel of each original
document, while minimizing translation so as to minimize both la-
bor and the potential for loss via corruption. If translation is un-
avoidable (as when translating labeling information), the approach
should guarantee that this translation will be reversible, so that the
original form can be recovered without loss.
The ideal approach should offer alternatives for levels of safety
and quality, volume of storage, ease of access, and other attributes at
varying costs, and it should allow these alternatives to be changed
for a given document, type of document, or corpus at any time in the
future. It should provide single-step access to all documents, without
requiring multiple layers of encapsulation to be stripped away to ac-
cess older documents, while allowing the contents of a digital docu-
ment to be extracted for conversion into the current vernacular, with-
out losing the original form of the document. It should offer up-front
acceptance testing at accession time, to demonstrate that a given doc-
ument will be accessible in the future. Finally, the only assumptions
it should make about future computers are that they will be able to
perform any computable function and (optionally) that they will be
faster and/or cheaper to use than current computers.
Avoiding Technological Quicksand 17
8. The Emulation In light of the foregoing analysis, I propose that the best (if not the
Solution only) way to satisfy the above criteria is to somehow run a digital
document’s original software. This is the only reliable way to recre-
ate a digital document’s original functionality, look, and feel. The
central idea of the approach I describe here is to enable the emula-
tion of obsolete systems on future, unknown systems, so that a digi-
tal document’s original software can be run in the future despite be-
ing obsolete. Though it may not be feasible to preserve every
conceivable attribute of a digital document in this way, it should be
possible to recreate the document’s behavior as accurately as de-
sired—and to test this accuracy in advance.
The implementation of this emulation approach would involve:
(1) developing generalizable techniques for specifying emulators that
will run on unknown future computers and that capture all of those
attributes required to recreate the behavior of current and future dig-
ital documents; (2) developing techniques for saving—in human-
readable form—the metadata needed to find, access, and recreate
digital documents, so that emulation techniques can be used for
preservation; and (3) developing techniques for encapsulating docu-
ments, their attendant metadata, software, and emulator specifica-
tions in ways that ensure their cohesion and prevent their corrup-
tion. Since this approach was first outlined (Michelson and
Rothenberg 1992, Rothenberg 1995a), it has received considerable
attention and has been cited as the only proposed approach that ap-
pears to offer a true solution to the problem of digital preservation
(Erlandson 1996).
guaranteed that these bit streams will be copied verbatim when stor-
age media are refreshed, to avoid corruption. This first group of en-
capsulated items represents the original document in its entire soft-
ware context: given a computing platform capable of emulating the
document’s original hardware platform, this information should rec-
reate the behavior of the original document.
The second type of information in the encapsulation of a docu-
ment consists of a specification of an emulator for the document’s
original computing platform. The specification must provide suffi-
cient information to allow an emulator to be created that will run on
any conceivable computer (so long as the computer is capable of per-
forming any computable function). This emulator specification can-
not be an executable program, since it must be created without
knowledge of the future computers on which it will run. Among oth-
er things, it must specify all attributes of the original hardware plat-
form that are deemed relevant to recreating the behavior of the origi-
nal document when its original software is run under emulation.
Only one emulator specification need be developed for any given
hardware platform: a copy of it (or pointer to it) can then be encapsu-
lated with every document whose software uses that platform. This
provides the key to running the software encapsulated with the doc-
ument: assuming that the emulator specification is sufficient to pro-
duce a working emulator, the document can be read (accessed in its
original form) by running its original software under this emulator.
The final type of information in the encapsulation of a document
consists of explanatory material, labeling information, annotations,
metadata about the document and its history, and documentation for
the software and (emulated) hardware included in the encapsulation.
This material must first explain to someone in the future how to use
the items in the encapsulation to read the encapsulated digital docu-
ment. In order to fulfill this function, at least the top level of this ex-
planatory material must remain human-readable in the future, to
serve as a “bootstrap” in the process of opening and using the encap-
sulation. This is one place where standards may find a niche in this
Avoiding Technological Quicksand 19
5 While transliteration need not be tied to refresh cycles, doing so minimizes the
number of passes that must be made through a collection of digital material. If a
single annotation standard is selected for all documents in a given corpus or
repository during a given epoch to simplify document management,
transliteration could be performed for all documents in a collection in lock-step,
just as media refreshing is done in lock-step. Though transliteration does not
necessarily have to be done at the same time as refreshing, doing so would be
more efficient (though potentially riskier) than performing transliteration and
refreshing at different times.
20 Avoiding Technological Quicksand
ing it, or distributing it. Emulation is needed only when the docu-
ment is to be read or when its content is to be extracted for transla-
tion into some vernacular form.6
Second, an emulator specification for a given obsolete hardware
platform need be created only once for all documents whose soft-
ware uses that platform. This provides tremendous leverage: if an
emulator specification is created for any document or document
type, it will confer longevity on all other digital documents that use
any of the software that runs on the given hardware platform.
Third, an emulator for a given obsolete platform need be created
only once for each future platform on which emulation is required to
run. Once created for each new generation of computer, the emulator
for a given obsolete platform can be run whenever desired on any
computer of that new generation. Generating new, running emula-
tors for new computing platforms from saved emulator specifica-
tions will therefore be a rare process: once it has been done to access
any document on a new platform, the resulting emulator for that
platform can be used to access all other documents saved using the
emulation scheme. The process of generating an emulator from its
specifications can therefore be relatively inefficient (since it need be
performed only infrequently), so long as the emulator that is gener-
ated is reasonably efficient when it runs.
tion programs. The approach proposed here allows for this variation,
though it assumes that emulating hardware platforms will usually
make the most sense.
Emulating the underlying hardware platform appears to be the
best approach, given the current state of the art. We do not have ac-
curate, explicit specifications of software, but we do (and must) have
such specifications for hardware: if we did not, we could not build
hardware devices in the first place. Why is it that we can specify
hardware but not software? Any specification is intended for some
reader or interpreter. Application software is intended to be inter-
preted automatically by hardware to produce an ephemeral, virtual
entity (the running application) whose behavior we do not require to
be fully specified (except to the hardware that will run it), since it is
intended to be used interactively by humans who can glean its be-
havior as they use it. On the other hand, a hardware specification is
interpreted (whether by humans or software) to produce a physical
entity (a computer) whose behavior must be well-specified, since we
expect to use it as a building block in other hardware and software
systems. Hardware specifications are by necessity far more rigorous
and meaningful than those of software. Emulating hardware is there-
fore entirely feasible and is in fact done routinely. 7
Hardware emulation is also relatively easy to validate: when
programs intended for a given computer run successfully on an em-
ulator of that computer, this provides reasonable assurance that the
emulation is correct. Test suites of programs could be developed spe-
cifically for the purpose of validation, and an emulator specification
could be tested by generating emulators for a range of different exist-
ing computers and by running the test suite on each emulator. A test
suite of this kind could also be saved as part of the emulator specifi-
cation and its documentation, allowing an emulator generated for a
future computer to be validated (in the future, before being used) by
running the saved test suite. In addition, the computer museum ap-
proach dismissed above might be used to verify future emulators by
comparing their behavior with that of saved, obsolete machines.
Furthermore, of the potential emulation approaches discussed
here, emulating hardware has the greatest leverage. Except for spe-
cial-purpose embedded processors (such as those in toasters, auto-
mobiles, watches, and other products), computers are rarely built to
run a single program: there are generally many more programs than
hardware platforms, even though some programs may run on more
than one platform. At any given moment, there are relatively few
hardware platforms in existence, though new hardware platforms
7 It is not yet clear whether digital preservation needs to include the retention of
attributes of the original medium on which a digital document was stored. It can
be argued that digital documents are (or should be) logically independent of their
storage media and therefore need not preserve the behavior of these media;
however, a counterargument to this might be that since some digital documents
are tailored to specific media (such as CD-ROM), they should retain at least some
attributes of those media (such as speed). The approach described here is neutral
with respect to this issue: it allows attributes of storage media to be emulated
when desired but does not require them to be.
24 Avoiding Technological Quicksand
to display themselves: not only are the images bundled with appro-
priate software, but they are protected from being accessed by any
other software, since there is no guarantee that such software will
treat the images appropriately. In this case, not even later versions of
the same program (such as Adobe Reader 3) are allowed to read the
images; though this may seem restrictive, it is in fact a good ap-
proach, since later versions of software do not always treat files cre-
ated by older versions appropriately.
All of these examples bundle software to be run on a known
platform, so none of them provides much longevity for their docu-
ments. Nevertheless they do prove that bundling original software
with a document is an effective way of making sure that the docu-
ment can be read.
The second class of natural experiments involves the use of emu-
lation to add longevity to programs and their documents. The first
example is a decades-old practice that hardware vendors have used
to provide upward compatibility for their customers. Forcing users
to rewrite all of their application software (and its attendant databas-
es, documents, and other files) when switching to a new computer
would make it hard for vendors to sell new machines. Many vendors
(in particular, IBM) have therefore often supplied emulation modes
for older machines in their new machines. The IBM 360, for example,
included an emulation mode for the older 7090/94 so that old pro-
grams could still be run. Apple did something similar when switch-
ing from the Motorola 68000 processor series to the PowerPC by in-
cluding an emulator for 68000 code; not only did this allow users to
run all of their old programs on the new machine, but significant
pieces of the Macintosh operating system itself were also run under
emulation after the switch, to avoid having to rewrite them. Whether
emulation is provided by a special mode using microcode or by a
separate application program, such examples prove that emulation
can be used to keep programs (and their documents) usable long af-
ter they would otherwise have become obsolete.
A second example of the use of emulation is in designing new
computing platforms. Emulation has long been used as a way of re-
fining new hardware designs, testing and evaluating them, and even
beginning to develop software for them before they have been built.
Emulators of this kind might be a first step toward producing the
emulator specifications needed for the approach proposed here:
hardware vendors might be induced to turn their hardware-design
emulators into products that could satisfy the emulator scheme’s
need for emulator specifications.
A final example of the use of emulation is in the highly active
“retro-computing” community, whose members delight in creating
emulators for obsolete video game platforms and other old comput-
ers. There are numerous World Wide Web sites listing hundreds of
free emulators of this kind that have been written to allow old pro-
grams to be run on modern computers. A particularly interesting ex-
ample of this phenomenon is the MAME (Multiple Arcade Machine
Emulator) system, which supports emulation of a large number of
26 Avoiding Technological Quicksand
One final piece of the puzzle is required to make the emulation ap-
proach work: how do we encapsulate all of the required items so that
they do not become separated or corrupted and so that they can be
handled as a single unit for purposes of data management, copying
to new media, and the like? While encapsulation is one of the core
concepts of computer science, the term carries a misleading connota-
tion of safety and permanence in the current context. An encapsula-
tion is, after all, nothing more than a logical grouping of items. For
example, whether these are stored contiguously depends on the de-
8 Although a sequence of such translations may be needed over time, all that is
really required is to save the sequence of encodings and translators: future
custodians of this explanatory information could then safely defer translating a
particular annotation until it is needed, so long as its encoding is not lost.
Avoiding Technological Quicksand 29
tails of the storage medium in use at any given time. The logical shell
implied by the term encapsulation has no physical reality (unless it is
implemented as a hardened physical storage device). And while it is
easy to mark certain bit streams as inviolate, it may be impossible to
prevent them from being corrupted in the face of arbitrary digital
manipulation, copying, and transformation.
Techniques must therefore be developed for protecting encapsu-
lated documents and detecting and reporting (or correcting) any vio-
lations of their encapsulation. In addition, criteria must be defined
for the explanatory information that must be visible outside an encap-
sulation to allow the encapsulation to be interpreted properly.
Many encapsulated digital documents from a given epoch will
logically contain common items, including emulator specifications
for common hardware platforms, common operating system and ap-
plication code files, software and hardware documentation, and
specifications of common annotation standards and their translators.
Physically copying all of these common elements into each encapsu-
lation would be highly redundant and wasteful of storage. If trust-
worthy repositories for such items can be established (by libraries,
archives, government agencies, commercial consortia, or other orga-
nizations), then each encapsulation could simply contain a pointer to
the required item (or its name and identifying information, along
with a list of alternative places where it might be found). Different
alternatives for storing common items may appeal to different insti-
tutions in different situations, so a range of such alternatives should
be identified and analyzed.
There is also the question of what should go inside an encapsula-
tion versus what should be presented at its surface to allow it to be
manipulated effectively and efficiently. In principle, the surface of an
encapsulation should present indexing and cataloging information to
aid in storing and finding the encapsulated document, a description
of the form and content of the encapsulated document and its associ-
ated items to allow the encapsulation to be opened, contextual and
historical information to help a potential user (or document manag-
er) evaluate the relevance and validity of the document, and man-
agement information to help track usage and facilitate retention and
other management decisions. All of this information should be read-
able without opening the encapsulation, since none of it actually re-
quires reading the encapsulated document itself.
It is logically necessary only that the tip of this information pro-
trude through the encapsulation: there must be some explanatory
annotation on the surface that tells a reader how to open at least
enough of the encapsulation to access further explanatory informa-
tion inside the encapsulation. Even this surface annotation will gen-
erally not be immediately human-readable, if the encapsulation is
stored digitally. If it happens to be stored on a physical medium that
is easily accessible by humans (such as a disk), then this surface an-
notation might be rendered as a human-readable label on the physi-
cal exterior of the storage unit, but this may not be feasible. For ex-
ample, if a large number of encapsulations are stored on a single
30 Avoiding Technological Quicksand
10. Summary The long-term digital preservation problem calls for a long-lived so-
lution that does not require continual heroic effort or repeated inven-
tion of new approaches every time formats, software or hardware
paradigms, document types, or recordkeeping practices change. This
approach must be extensible, since we cannot predict future changes,
and it must not require labor-intensive translation or examination of
individual documents. It must handle current and future documents
of unknown type in a uniform way, while being capable of evolving
as necessary. Furthermore, it should allow flexible choices and
tradeoffs among priorities such as access, fidelity, and ease of docu-
ment management.
Most approaches that have been suggested as solutions to this
problem—including reliance on standards and the migration of digi-
tal material into new forms as required—suffer from serious inade-
quacies. In contrast, the emulation strategy as elaborated above,
though it requires further research and proof of feasibility, appears to
have many conceptual advantages over the other approaches sug-
gested and is offered as a promising candidate for a solution to the
problem of preserving digital material far into the future.
Avoiding Technological Quicksand 31
Coleman, J., and Don Willis. 1997. SGML as a Framework for Digital
Preservation and Access. Washington, D.C.: Commission on Preserva-
tion and Access.
United States District Court for the District of Columbia. 1993. Opin-
ion of Charles R. Richey, U.S. District Judge, January 6.