DSPace vs. GreenStone
DSPace vs. GreenStone
DSPace vs. GreenStone
D-Lib Magazine
September 2005
Volume 11 Number 9
ISSN 1082-9873
StoneD
A Bridge between Greenstone and DSpace
Ian H. Witten*, David Bainbridge*, Robert Tansley†, Chi-Yu Huang*, and
Katherine J. Don*
†Hewlett-Packard Labs
Cambridge, MA, USA
<[email protected]>
Abstract
Greenstone and DSpace are widely used software systems for digital
libraries, and prospective users sometimes wonder which one to adopt.
In fact, the aims of the two are very different, although their domains
of application do overlap. This article describes the two systems and
identifies their similarities and differences. We also present StoneD
[note 1] a bridge between the production versions of Greenstone and
DSpace that allows users of either system to easily migrate to the
other, or continue with a combination of both. This bridge eliminates
the risk of finding oneself locked in to an inappropriate choice of
system. We also discuss other possible opportunities for combining the
advantages of the two, to the benefit of the user communities of both
systems.
1. Introduction
Of the many open source systems for digital libraries, two of the most
prominent are Greenstone [1, 2] and DSpace [3, 4]. Greenstone is
Greenstone
The key points that Greenstone makes it its core business to support
include:
The liaison with UNESCO and Human Info has been a crucial factor in
the development of Greenstone. Human Info began using Greenstone
to produce collections in 1998, and provided extensive feedback on
the reader's interface. UNESCO wants to empower developing
countries to build their own digital library collections – otherwise they
DSpace
The key points that DSpace makes it its core business to support
include:
The liaison with MIT has been a crucial factor in the development of
DSpace. A co-development contract between Hewlett-Packard Labs
and MIT Libraries was established in March 2000, and MIT publicly
launched its institutional repository in November 2002. From the
outset, the plan was to create an infrastructure for storing the digitally
born intellectual output of the MIT community and to make it
accessible over the long term to the broadest possible readership [3].
Plans are afoot to develop the next version, which will be a more
modular, flexible version into which different modules of functionality
can be plugged in to suit different needs, and with refactored storage
to enable different approaches to the digital preservation problem to
be tested.
Differences
Note that both systems are continually evolving, and these features
can change rapidly. For example, Greenstone can indeed
accommodate dynamic collections by using a different search engine
from the default one. Although this is probably beyond the technical
capabilities of the librarian-level users that Greenstone targets, a user
interface enhancement could easily rectify this. Conversely, although
the default DSpace configuration is currently restricted to UNIX, it
would not be hard to modify it for other operating systems. And there
are some DSpace installations in languages other than English.
3. Building Bridges
OAI-PMH-level integration
METS-level integration
Much the same discussion above can be applied in the other direction.
While some of the terminology changes, the ideas remain the same, as
does the end result: seamless integration of Greenstone collections
within a DSpace site. Manipulating the inheritance hierarchy in
DSpace, new Java classes could be introduced that access
Greenstone3 functionality. This could be accomplished at the servlet
level, taking advantage of the XML output option, or more directly
through the Greenstone message passing mechanism over SOAP.
make it easy for users to pick the features that suit them best.
4. StoneD
Example
In Figure 3 the user has accessed the Greenstone home page for the
same collection. Greenstone collections can easily be customized by
end users, but in this default case the two versions of the collection
offer essentially the same features. Here, users can search and browse
by title and author just as in DSpace, although the interface layout
differs. Figure 4 shows the page accessed by clicking the titles a-z
button in the navigation bar. From here the various source documents
that Greenstone users can search the full text of any document to
locate an item of interest. However, DSpace users can peruse a list of
recently added items, a notion that is less natural in a collection that is
built afresh each time. We emphasize that with a little effort each
system could be configured to add the missing facility if desired.
But first there is one further design element to consider. When the
(a) (b)
Now we are ready to press the build button in the Create tab, shown
in Figure 6. On completion, this yields a collection visually identical to
that shown in Figures 3-4. One of the options on the Librarian
Interface's File menu is Export. Upon start-up, the system interrogates
the export script described earlier for a list of known export file
formats, and these formats are dynamically added into the interface.
Activating File→Export produces a popup that lists the available
formats – currently METS and DSpace. Choosing DSpace, browsing to
a suitable metadata mapping file (if required), and pressing the export
button produces a set of files that can be transferred to a DSpace
installation and imported in batch mode. The resulting collection is
shown in Figures 1-2.
5. Conclusions
Greenstone and DSpace are both designed to help third parties set up
their own digital libraries. However, they represent rather different
perspectives and have different, and in many ways complementary,
goals and strengths. One goal they share is to be flexible, and both can
be customized and modified at many different levels – including the
programming level, since they are open source systems. This gives the
ultimate flexibility and yields significant advantages over closed-
source systems. Of course, this very flexibility makes fair comparison
tricky.
This article has compared and contrasted the two systems' goals in
terms of the core business that they aim to support, and compared
their features in terms of their natural domain of operation. A crude
caricature of the difference is that Greenstone supports individually
designed collections of different kinds of documents and metadata in
an international setting – epitomized by completely static collections
on CD-ROM or DVD – whereas DSpace supports institutions in their
struggle to capture and disseminate the intellectual output of an
institution and preserve it indefinitely – epitomized by its use by MIT
Libraries, who helped pioneer its development. However, each system
is highly flexible and customizable to meet a wide variety of needs.
There are fertile opportunities for crossover between the two systems.
As well as being of great practical benefit to users, studying these
opportunities sheds light on many practical issues of interoperability
between different digital library systems. Standard interoperability
frameworks include OAI-PMH, which focuses on interoperability of
metadata alone, and METS, which is a general framework that focuses
on interoperability of document and metadata containers. Neither of
these provides a sufficient mechanism for a satisfactory bridge
Acknowledgements
Notes
1 Pronounced Stone-Dee.
5 The converse problem is not an issue, because DSpace does not look
inside its document files except when directed to create a full-text
index, and then it simply skips over files that are in formats it cannot
process.
References
[4] Smith, M., Bass, M., McClella, G., Tansley, R., Barton, M.,
Branschofsky, M., Stuve, D. and Walker, J.H. (2003) "DSpace: An open
source dynamic digital repository." D-Lib Magazine 9(1)
(doi:10.1045/january2003-smith).
[6] Witten, I. H., Cunningham, S.J. and Apperley, M. (1996) "The New
Zealand Digital Library Project." D-Lib Magazine 2(11)
(doi:10.1045/november96-witten).
[9] Bainbridge, D., Don, K.J., Buchanan, G.R., Witten, I.H., Jones, S.,
Jones, M. and Barr, S.I. (2004) "Dynamic digital library construction
and configuration." Proc European Digital Library Conference, Bath,
England.
pp 446-460.
[12] Bainbridge, D., Edgar, K.D., McPherson, J.R. and Witten, I.H.
(2003) "Managing change in a digital library system with many
interface languages." Proc European Conference on Digital Libraries,
Trondheim, Norway.
[16] Brittain, J. and Darwin, I.F. (2003) Tomcat: The definitive guide.
O'Reilly.
[19] Witten, I. H., Bainbridge, D., Paynter, G.W. and Boddie, S. (2002)
"The Greenstone plugin architecture." Proc Joint Conference on
Digital Libraries, Portland, Oregon.
Copyright © 2005 Ian H. Witten, David Bainbridge, Robert Tansley, Chi-Yu Huang and
Katherine J. Don
Top | Contents
Search | Author Index | Title Index | Back Issues
Previous Article | Next article
Home | E-mail the Editor
doi:10.1045/september2005-witten