0% found this document useful (0 votes)
13 views45 pages

How to Build a Digital Library Using Open-source s

This document outlines the process of building a digital library using the Greenstone open-source software. It covers the capabilities of Greenstone, including collection management, document types, searching, and browsing features. The paper also discusses the importance of metadata and plugins in creating and maintaining digital libraries.

Uploaded by

dhanishgupta17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views45 pages

How to Build a Digital Library Using Open-source s

This document outlines the process of building a digital library using the Greenstone open-source software. It covers the capabilities of Greenstone, including collection management, document types, searching, and browsing features. The paper also discusses the importance of metadata and plugins in creating and maintaining digital libraries.

Uploaded by

dhanishgupta17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220924005

How to build a digital library using open-source software

Conference Paper · July 2002


DOI: 10.1145/544220.544365 · Source: DBLP

CITATIONS READS
4 16,066

1 author:

Ian Witten
University of Waikato
560 PUBLICATIONS 104,908 CITATIONS

SEE PROFILE

All content following this page was uploaded by Ian Witten on 04 November 2014.

The user has requested enhancement of the downloaded file.


Building DLs using open source software Ian H. Witten

How to build a digital library


using
open source software

Learn how to build your own digital library with the


Greenstone digital library software, an open-source
system for managing collections

Ian H. Witten
Computer Science Department
Waikato University
New Zealand
http://nzdl.org/

Agenda

Part 1: What Greenstone can do

Part 2: Building a collection


Plugins, classifiers,
format statements

Part 3: Running Greenstone


™ installation
™ collection-building
™ administrative/maintenance pages

May 2001
Building DLs using open source software Ian H. Witten

Part 1: What Greenstone can do


– What is a digital library?

❖ Greenstone software … illustrated


– collections
– demo of Humanity Development Library
– documents (+ pictures, voice, music)
– multilingual (+ Maori, French, Arabic, Chinese)

❖ Finding documents: searching and browsing


– full-text searching, fielded searching
– metadata-based browsing
– Lists, alphabetic lists, date lists
– Hierarchical browsing structures

❖ Document types and formats


– plugins and classifiers

❖ Configuring a collection

“Digital library” … means


different things to different people!
Collection of digital objects (text, video, audio) along
with methods for access and retrieval, [user]
and for selection, organisation, and maintenance [lib]

• Traditional user/librarian distinction is blurred


• Computers make information active
• Kitchens for knowledge preparation
• WWW ≠ DL!—organization, selectivity
• Nice Web site ≠ DL!—import new documents easily

Data smallest discernible difference (change of state)


Information difference between data and your expectations
Knowledge accumulation of your set of expectations
Wisdom the value attached to knowledge

May 2001
Building DLs using open source software Ian H. Witten

Greenstone software
Collections ™ “Library” = set of separate collections
“Collection” = set of separate documents
™ Multigigabyte collections

Documents ™ Hierarchical document model


™ Multimedia picture, voice, music, video collections
™ Multi-language documents Unicode throughout
™ Multi-language interfaces French, Chinese, Arabic …

Access ™ Web browser or CD-ROM


™ Searching full-text and fielded, ranked or boolean
™ Browsing hierarchical indexes created from metadata
™ Metadata Dublin core + collection-specific extensions

Importing ™ Plugins different document types and metadata specifications


™ Classifiers create browsing indexes (collection editor decides)

Distributing ™ Compression techniques throughout uses MG


™ Distributed collections coming soon, with Corba
™ Open-source software free, extensible

Collections:
on the Web
nzdl.org

(demo, not
service)

May 2001
Building DLs using open source software Ian H. Witten

Example

Humanity Development Library


for sustainable development and basic human needs

• 160,000 pages • CD-ROM


• 30,000 images • US$6
• 800 books • Win3.1x(!)/95/98/NT
• 430 magazines • Stand-alone
• 340 kg • and intranet server
• US$20,000 • Web browser user interface

Global Help Project, Antwerp (+ UN agencies)

May 2001
Building DLs using open source software Ian H. Witten

Greenstone collections: on CD-ROM

UN and NGOs, e.g.


™UNESCO
™Global Help Project
™United Nations University
™World Health Organization
™Pan American Health Organization

The United Nations says ...

❖ We are profoundly concerned at the deepening


mal-distribution of access, resources and
opportunities in the information and
communication field …
❖ A new type of poverty, “information poverty,”
looms ...
❖ Most developing countries … are not sharing in the
communications revolution …
❖ The knowledge gap is widening

Statement on Universal Access to Basic


Communication and Information Services, 1997

May 2001
Building DLs using open source software Ian H. Witten

What are documents?


™Hierarchical document model
Sections, subsections, …, paragraphs

™Metadata
Dublin core, for searching and browsing

™Multimedia
Picture, Voice, Music

™Multi-language documents
Maori, French, Arabic, Chinese, …

™Multi-language interfaces
French, Chinese, …

Hierarchical
document model

A book

May 2001
Building DLs using open source software Ian H. Witten

Metadata
specified at
any level

A
bibliography
collection

French documents
+ French interface

UNESCO,
Paris

May 2001
Building DLs using open source software Ian H. Witten

Arabic documents
+ English interface

Chinese documents
(pictures of text)
+ Chinese interface

May 2001
Building DLs using open source software Ian H. Witten

Chinese documents
+ Chinese interface

Acronym extraction
plugin

May 2001
Building DLs using open source software Ian H. Witten

Language
identification plugin

Email plugin

May 2001
Building DLs using open source software Ian H. Witten

Searching and browsing


™Searching
™Metadata-based browsing

Subject Title Publisher “HowTo”

Dublin Core ad hoc

Searching:
multiple indexes

(editor chooses)

text
metadata

May 2001
Building DLs using open source software Ian H. Witten

Different collection:
different indexes

metadata

Ranked OR
Boolean AND

May 2001
Building DLs using open source software Ian H. Witten

Full Boolean
queries, plus
other search
preferences

Multilingual
searching
(Unicode)

May 2001
Building DLs using open source software Ian H. Witten

Browsing:
different
“classifier”
types

List
classifier
(Howto
metadata)

AZList
classifier
(Title
metadata)

May 2001
Building DLs using open source software Ian H. Witten

DateList
classifier
(Date
metadata)

Custom-made
classifier
(Title
metadata)

Simple variant
of AZList (2
lines of PERL)

May 2001
Building DLs using open source software Ian H. Witten

Hierarchy
classifier
(Subject
metadata)

Multilevel
hierarchy

Information
specified in
auxiliary file

Multilevel
hierarchy

May 2001
Building DLs using open source software Ian H. Witten

Multilevel
hierarchy

Different document types and formats:


plugins and classifiers

Plugins
™ format-specific parsing of source documents
(and metadata specs)
™ pipeline: files are passed to each plugin in turn
™ ~ a dozen plugins (TEXT, HTML, EMAIL, WORD,
RTF, PDF, PS …)

Classifiers
™ Create browsing indexes
™ VLists, HLists, and DateLists
™ Hierarchical structure of lists
™ List
one-level: single VList
two-level: HList and Vlist
(e.g. SectionList, AZList, DateList)

May 2001
Building DLs using open source software Ian H. Witten

Collection creator
maintainer
[email protected]
[email protected]

configuration public
beta
true
true

file indexes
defaultindex
section:text section:Title document:text
section:text

™ name, icon, etc plugin


plugin
GMLPlug
HBPlug
™ description plugin
plugin
ArcPlug
IndexPlug
™ email of creator plugin RecPlug

™ search indexes classify Hierarchy hfile=sub.txt metadata=Subject sort=Title

™ plugins
classify HDLList metadata=Title
classify Hierarchy hfile=org.txt metadata=Organization sort=Title

™ classifiers
classify List metadata=Howto

format SearchVList "<td valign=top>[link][icon][/link]</td>


<td>{If}{[parent(All': '):Title],[parent(All': '):Title]: }
[link][Title][/link]</td>"
how to format format
format
CL4VList "<br>[link][Howto][/link]"
DocumentImages true
™ documents format DocumentText "<h3>[Title]</h3>\\n\\n<p>[Text]"

™ query results collectionmeta collectionname "greenstone demo"

™ classifiers
collectionmeta collectionextra "This is a demonstration collection for the
Greenstone digital library software.\nIt contains a small
subset (11 books) of the Humanity Development Library"
collectionmeta iconcollectionsmall "/gsdl/collect/demo/images/demosm.gif"
collectionmeta iconcollection "/gsdl/collect/demo/images/demo.gif"
collectionmeta .section:Title "section titles"
collectionmeta .document:text "entire books"
collectionmeta .section:text "chapters"

How to build a digital library


using open source software

Part 1: What Greenstone can do

Part 2: Building a collection


Plugins, classifiers,
format statements

Part 3: Running Greenstone


™ installation
™ collection-building
™ administrative/maintenance pages

May 2001
Building DLs using open source software Ian H. Witten

Part 2: Building a collection


– What Greenstone does

❖ Using the Collector


❖ Altering the configuration
❖ GML: Greenstone markup language
❖ collect.cfg for the Demo collection
❖ Plugins
❖ Classifiers
– List
– AZList
– DateList
– HDLList
– Hierarchy
❖ Format strings
– classifiers
– search results
– document text

Greenstone DL software
Access 9 Accessible via any Web browser
9 Server runs on Windows and Unix
9 Collections can be published on CD-ROM

Searching/ 9 Full-text and fielded search


browsing 9 Flexible browsing facilities
9 Metadata-based (Dublin Core)
9 Collection-specific
9 Hierarchical phrase browsing supported
9 Creates all access structures automatically

Extensible 9 Plugins — new document, metadata formats


9 Classifiers — new metadata browsers

Multilingual 9 Documents and interfaces


9 Chinese, Arabic, Maori, Russian etc (+ European)
9 Multimedia: video, audio collections exist

Distributed 9 CORBA protocol allows remote access


9 Z39.50 server/client for backwards compatibility
What you see — you can get!
9 Gnu licensed

May 2001
Building DLs using open source software Ian H. Witten

The pen is mightier than the sword!


Building and distributing information carries responsibilities …
Collector legal … social … ethical …
Be aware of the power of information and use it wisely
=
software
“wizard”
for
building
new
collections

May 2001
Building DLs using open source software Ian H. Witten

May 2001
Building DLs using open source software Ian H. Witten

May 2001
Building DLs using open source software Ian H. Witten

Status
updated
every 5
secs

May 2001
Building DLs using open source software Ian H. Witten

Collection configuration file


creator [email protected]
maintainer [email protected]
public true
beta true

indexes document:text
defaultindex document:text

plugin ZIPPlug
plugin GMLPlug
plugin TEXTPlug
plugin HTMLPlug –file_is_url
plugin EMAILPlug
plugin ArcPlug
plugin RecPlug

classify AZList metadata=Title

collectionmeta collectionname "Women’s History Excerpt"


collectionmeta collectionextra "This collection is an excerpt for \
demonstration purposes, based on the Women’s … \
… contains _about:numdocs_ documents"
collectionmeta .document:text "documents"

Alter configuration
™ Add full-text index of titles ... indexes document:Title
additional indexes line

™ ... or authors indexes


… needdocument:Creator
author metadata

™ Add alphabetic author browserclassify AZList add –metadata Creator


classifier line

™ Include Word documents plugin WordPlugadd plugin line

™ Include PDF documents plugin PDFPlug (same)

™ Separate index for each language languagesadd enlanguages


fr es line

™ Extract acronyms and add list plugin PDFPlug plugin option


–extract_acronyms

™ Extract keyphrases and add browser (coming real soon)


™ Extract phrase hierarchy and add add classifier line
classify phind
browser
™ Alter the format of any of the format
above… add format string
™ Restrict collection’s interface langs
format PreferenceLangs
add format stringen|fr|es

™ Change default interface language cgiarg shortname=1 argdefault


edit site config file =fr

May 2001
Building DLs using open source software Ian H. Witten

Dublin Core metadata


Metadata Tag Definition

Title Title A name given to the resource


Creator Creator An entity primarily responsible for making the content
of the resource
Subject and keywords Subject The topic of the content of the resource
Description Description An account of the content of the resource
Publisher Publisher An entity responsible for making the resource available
Contributor Contributor An entity responsible for making contributions to the
content of the resource
Date Date A date associated with an event in the life cycle of the
resource
Resource type Type The nature or genre of the content of the resource
Format Format The physical or digital manifestation of the resource
Resource identifier Identifier An unambiguous reference to the resource within a
given context: this is the object identifier or OID
Source Source A Reference to a resource from which the present
resource is derived
Language Language A language of the intellectual content of the resource
Relation Relation A reference to a related resource
Coverage Coverage The extent or scope of the content of the resource
Rights management Rights Information about rights held in and over the resource

GML: Greenstone markup language

<gsdlsection>
<metadata>
<gsdlsourcefilename> uu02fe.txt </gsdlsourcefilename>
<gsdldoctype> indexed_doc </ gsdldoctype>
<Identifier> HASHa723e7e164df07c833bfc4 </Identifier>
<Title> Freshwater Resources in Arid Lands </Title>
<gsdlassocfile> cover.jpg:image/jpeg </gsdlassocfile>
<gsdlassocfile> p21.jpg:image/jpeg </gsdlassocfile>
<gsdlassocfile> p22.jpg:image/jpeg </gsdlassocfile>
</metadata>

This is the text of the document

</gsdlsection>

May 2001
Building DLs using open source software Ian H. Witten

GML: Greenstone markup language

<gsdlsection>
<metadata> ... </metadata>

<gsdlsection>
<metadata>
<gsdlnum> 1 </gsdlnum>
<Title> Preface </Title>
</metadata>
This is the text of the preface

</gsdlsection>

Rest of the document

</gsdlsection>

GML: Greenstone markup language


<gsdlsection>
<metadata> ... </metadata>
<gsdlsection> <metadata> ... </metadata> ... </gsdlsection>

<gsdlsection>
<metadata>
<gsdlnum> 2 </gsdlnum>
<Title> Conclusions </Title>
</metadata>

<gsdlsection>
<metadata>
<gsdlnum> 1 </gsdlnum>
<Title> Part 1 </Title>
</metadata>

This is the first part of the conclusions


</gsdlsection>
</gsdlsection>
Rest of the document
</gsdlsection>

May 2001
Building DLs using open source software Ian H. Witten

GML: Greenstone markup language


<gsdlsection>
<metadata> ... </metadata>
<gsdlsection> <metadata> ... </metadata> ... </gsdlsection>

<gsdlsection>
<metadata> ... Conclusions </metadata>
<gsdlsection>
<metadata> ... Part 1 </metadata> ... </gsdlsection>

<gsdlsection>
<metadata>
<gsdlnum> 2 </gsdlnum>
<Title> Part 2 </Title>
</metadata>

This is the second part of the conclusions

</gsdlsection>

Rest of the document

</gsdlsection>

creator [email protected]

collect.cfg
maintainer [email protected]
public true

for the
beta true

demo
indexes section:text section:Title document:text
defaultindex section:text

collection plugin GMLPlug


plugin HBPlug
plugin ArcPlug
plugin IndexPlug
plugin RecPlug

classify Hierarchy hfile=sub.txt metadata=Subject sort=Title


classify AZList metadata=Title
classify Hierarchy hfile=org.txt metadata=Organization sort=Title
classify List metadata=Howto

format SearchVList "<td valign=top>[link][icon][/link]</td>


<td>{If}{[parent(All': '):Title],[parent(All': '):Title]: }
[link][Title][/link]</td>"
format CL4VList "<br>[link][Howto][/link]"
format DocumentImages true
format DocumentText "<h3>[Title]</h3>\\n\\n<p>[Text]"

collectionmeta collectionname "greenstone demo"


collectionmeta collectionextra "This is a demonstration collection for the
Greenstone digital library software.\nIt contains a small
subset (11 books) of the Humanity Development Library"
collectionmeta iconcollectionsmall "/gsdl/collect/demo/images/demosm.gif"
collectionmeta iconcollection "/gsdl/collect/demo/images/demo.gif"
collectionmeta .section:Title "section titles"
collectionmeta .document:text "entire books"
collectionmeta .section:text "chapters"

May 2001
Building DLs using open source software Ian H. Witten

document, “section”
or paragraph
text or any
metadata

indexes section:text
section:Title
document:text
defaultindex section:text

collectionmeta .section:Title "section titles"


collectionmeta .document:text "entire books"
collectionmeta .section:text "chapters"

creator [email protected]

collect.cfg
maintainer [email protected]
public true

for the
beta true

demo
indexes section:text section:Title document:text
defaultindex section:text

collection plugin GMLPlug


plugin HBPlug
plugin ArcPlug
plugin IndexPlug
plugin RecPlug

classify Hierarchy hfile=sub.txt metadata=Subject sort=Title


classify AZList metadata=Title
classify Hierarchy hfile=org.txt metadata=Organization sort=Title
classify List metadata=Howto

format SearchVList "<td valign=top>[link][icon][/link]</td>


<td>{If}{[parent(All': '):Title],[parent(All': '):Title]: }
[link][Title][/link]</td>"
format CL4VList "<br>[link][Howto][/link]"
format DocumentImages true
format DocumentText "<h3>[Title]</h3>\\n\\n<p>[Text]"

collectionmeta collectionname "greenstone demo"


collectionmeta collectionextra "This is a demonstration collection for the
Greenstone digital library software.\nIt contains a small
subset (11 books) of the Humanity Development Library"
collectionmeta iconcollectionsmall "/gsdl/collect/demo/images/demosm.gif"
collectionmeta iconcollection "/gsdl/collect/demo/images/demo.gif"
collectionmeta .section:Title "section titles"
collectionmeta .document:text "entire books"
collectionmeta .section:text "chapters"

May 2001
Building DLs using open source software Ian H. Witten

collectionmeta collectionname "greenstone demo"


collectionmeta collectionextra "This is a demonstration collection for the
Greenstone digital library software.\nIt contains a small
subset (11 books) of the Humanity Development Library"
collectionmeta iconcollectionsmall "/gsdl/collect/demo/images/demosm.gif"
collectionmeta iconcollection "/gsdl/collect/demo/images/demo.gif"

Help text generated


automatically

Plugins plugin GMLPlug


plugin HBPlug
plugin ArcPlug
plugin IndexPlug
plugin RecPlug

Used by collection-building software to accomplish format-


specific parsing of source documents

Plugin pipeline: files are passed to each plugin in turn until one
is found that can process it

™ GMLPlug processes .gml files generated during import


™ HBPlug processes HTML marked up for UN collections
™ ArcPlug processes .gml filelist in archives.inf
™ IndexPlug assigns metadata from index.txt file
™ RecPlug recurses through a directory structure

Also TEXTPlug, HTMLPlug, EMAILPlug, WORDPlug, RTFPlug,


PDFPlug, PSPlug, FoxPlug, PrePlug, GBPlug, TCCPlug …

May 2001
Building DLs using open source software Ian H. Witten

import/index.txt
Used by IndexPlug
to add metadata to .gml files
Filename Subject metadata
Organization metadata
Howto metadata Magazine metadata
key
line
key: Subject Organization Howto Magazine
bostid/b22bue 16.11 bostid "start a butterfly farm"
faobetf/fb33fe 14.12 faobfs <Subject>16.11
faobetf/fb34fe 14.12 faobfs "farm snails" <Subject>16.11
bostid/b18ase 16.11 bostid "introduce little-known Asian farm animals with
a promising future"
bostid/b20cre 16.11 bostid
bostid/b17mie 16.11 bostid "introduce small animals and micro-livestock in
your farm"
bostid/b21wae 16.5 bostid "utilize the Water Buffalo more effectively"
<Subject>16.11
ecourier/ec158e 23.15 ecc <Subject>8.1 "<Magazine>The Courier"
ecourier/ec159e 23.15 ecc <Subject>6.1 "<Magazine>The Courier"
ecourier/ec160e 23.15 ecc <Subject>21.1 "<Magazine>The Courier"
wb/wb34te 6.4 wb "achieve gender equality"

multiple assignments possible

classify Hierarchy hfile=sub.txt metadata=Subject sort=Title


Classifiers classify
classify
AZList metadata=Title
Hierarchy hfile=org.txt metadata=Organization
sort=Title
classify List metadata=Howto

Used to create a collection’s browsing indexes;


information generated is stored in the GDBM database

™ Hierarchy
hierarchical structure of Vlists, Hlists, and Datelists
™ List
one-level hierarchy consisting of a single Vlist
two-level hierarchy consisting of a Hlist and Vlist
¾ SectionList
¾ AZList
¾ AZSectionList
¾ DateList

May 2001
Building DLs using open source software Ian H. Witten

classify List metadata=Howto

™ The List classifier creates a linear list of the


specified metadata values

™ The AZList classifier creates a list with A-Z tabs

™ The DateList classifier creates a list of dates

(recall: Howto metadata was specified in the index.txt file)

List
classifier
(Howto)

May 2001
Building DLs using open source software Ian H. Witten

AZList
classifier
(Title)

classify AZList metadata=Title

AZList
classifier
(Title) for
the demo
collection

classify AZList metadata=Title

May 2001
Building DLs using open source software Ian H. Witten

DateList
classifier

classify DateList metadata=Date

HDLList
classifier
(Title)

classify HDLList metadata=Title

May 2001
Building DLs using open source software Ian H. Witten

The HDLList classifier


HDLList is AZList + extra code to read etc/mags.txt
and special-case magazines (2 lines)

classify HDLList metadata=Title

etc/mags.txt
identifier (matches Magazine metadata value)
position in browsing hierarchy
title for hierarchy browser

CERES 1 CERES
"Food and Nutrition Bulletin" 2 "Food and Nutrition Bulletin"
"The Courier" 3 "The Courier"
"SPORE Bulletin" 4 "SPORE Bulletin"
"Boiling Point" 5 "Boiling Point"
"Developing Ideas" 6 "Developing Ideas"
"GATE Magazine" 7 "GATE Magazine"
"Go Between" 8 "Go Between"
"Basin - News" 9 "Basin - News"

classify Hierarchy hfile=org.txt


metadata=Organization sort=Title

etc/org.txt
identifier (matches Organization metadata value)
position in the hierarchy that the browser implements
title to be displayed in hierarchy browser

accu 1 ACCU
ag21 2 "Agenda 21"
bgz 3 "BASIN - GTZ - SKAT"
bostid 4 BOSTID
cps 5 CPS
cpas 6 "CTA Spore"
cf 7 "Commonwealth Foundation"
csu 8 "Computer Science Unplugged"
dcs 9 "Development Consultancy Services"
ecc 11 "EC Courier"
ecdg8 12 "EC DG8"
echo 13 "Educational Concerns for Hunger Organization"
fao 14 FAO
faobfs 15 "FAO Better Farming series"
ceres 16 "FAO Ceres"
ff 17 "Food First"

May 2001
Building DLs using open source software Ian H. Witten

Hierarchy
classifier
(Organization)

classify Hierarchy hfile=org.txt


metadata=Organization sort=Title

Hierarchy
classifier
(Organization)

classify Hierarchy hfile=org.txt


metadata=Organization sort=Title

May 2001
Building DLs using open source software Ian H. Witten

classify Hierarchy hfile=sub.txt


metadata=Subject sort=Title
etc/sub.txt
identifier (matches Subject metadata value)
position in the hierarchy that the browser implements
title to be displayed in hierarchy browser

1 1 "General reference"
1.2 1.2 "Dictionaries, glossaries, language courses, terminology (all languages)"
2 2 "Sustainable Development, International cooperation, Projects; NGO,
Organizations, Poverty and Hunger Alleviation, Basic Human Needs"
2.1 2.1 "Development policy and theory, international cooperation, national planning,
national plans"
2.2 2.2 "Development, national planning, national plans"
2.3 2.3 "Project planning and evaluation (incl. project management and dissemination
strategies)"
2.4 2.4 "Regional development and planning incl. regional profiles"
2.5 2.5 "Nongovernmental organizations (NGOs) in general, self- help organizations
(their role in development). For specific organizations such as women's or peasants' see
respective class within 06."
2.6 2.6 "Organizations, institutions, United Nations (general, directories,
yearbooks, annual reports, etc.)"
2.6.1 2.6.1 "United Nations"
2.6.2 2.6.2 "International organizations"
2.6.3 2.6.3 "Regional organizations"
2.6.5 2.6.5 "European Community - European Union"
2.7 2.7 "Sustainable Development, Development models and examples;

Hierarchy
classifier
(Subject)

1 1 "General reference"
1.2 1.2 "Dictionaries, glossaries, language courses, terminology (all languages)"
2 2 "Sustainable Development, International cooperation, Projects; NGO,
Organizations, Poverty and Hunger Alleviation, Basic Human Needs"
2.1 2.1 "Development policy and theory, international cooperation, national planning,
national plans"
2.2 2.2 "Development, national planning, national plans"
2.3 2.3 "Project planning and evaluation (incl. project management and dissemination
strategies)"
2.4 2.4 "Regional development and planning incl. regional profiles"
2.5 2.5 "Nongovernmental organizations (NGOs) in general, self- help organizations
(their role in development). For specific organizations such as women's or peasants' see
respective class within 06."
2.6 2.6 "Organizations, institutions, United Nations (general, directories,
yearbooks, annual reports, etc.)"
2.6.1 2.6.1 "United Nations"
2.6.2 2.6.2 "International organizations"
2.6.3 2.6.3 "Regional organizations"
2.6.5 2.6.5 "European Community - European Union"
2.7 2.7 "Sustainable Development, Development models and examples;

classify Hierarchy hfile=sub.txt


metadata=Subject sort=Title

May 2001
Building DLs using open source software Ian H. Witten

Hierarchy
classifier
(Subject)

1 1 "General reference"
1.2 1.2 "Dictionaries, glossaries, language courses, terminology (all languages)"
2 2 "Sustainable Development, International cooperation, Projects; NGO,
Organizations, Poverty and Hunger Alleviation, Basic Human Needs"
2.1 2.1 "Development policy and theory, international cooperation, national planning,
national plans"
2.2 2.2 "Development, national planning, national plans"
2.3 2.3 "Project planning and evaluation (incl. project management and dissemination
strategies)"
2.4 2.4 "Regional development and planning incl. regional profiles"
2.5 2.5 "Nongovernmental organizations (NGOs) in general, self- help organizations
(their role in development). For specific organizations such as women's or peasants' see
respective class within 06."
2.6 2.6 "Organizations, institutions, United Nations (general, directories,
yearbooks, annual reports, etc.)"
2.6.1 2.6.1 "United Nations"
2.6.2 2.6.2 "International organizations"
2.6.3 2.6.3 "Regional organizations"
2.6.5 2.6.5 "European Community - European Union"
2.7 2.7 "Sustainable Development, Development models and examples;

classify Hierarchy hfile=sub.txt


metadata=Subject sort=Title

Hierarchy
classifier
(Subject)

1 1 "General reference"
1.2 1.2 "Dictionaries, glossaries, language courses, terminology (all languages)"
2 2 "Sustainable Development, International cooperation, Projects; NGO,
Organizations, Poverty and Hunger Alleviation, Basic Human Needs"
2.1 2.1 "Development policy and theory, international cooperation, national planning,
national plans"
2.2 2.2 "Development, national planning, national plans"
2.3 2.3 "Project planning and evaluation (incl. project management and dissemination
strategies)"
2.4 2.4 "Regional development and planning incl. regional profiles"
2.5 2.5 "Nongovernmental organizations (NGOs) in general, self- help organizations
(their role in development). For specific organizations such as women's or peasants' see
respective class within 06."
2.6 2.6 "Organizations, institutions, United Nations (general, directories,
yearbooks, annual reports, etc.)"
2.6.1 2.6.1 "United Nations"
2.6.2 2.6.2 "International organizations"
2.6.3 2.6.3 "Regional organizations"
2.6.5 2.6.5 "European Community - European Union"
2.7 2.7 "Sustainable Development, Development models and examples;

classify Hierarchy hfile=sub.txt


metadata=Subject sort=Title

May 2001
Building DLs using open source software Ian H. Witten

Format format DocumentImages true

strings
format DocumentText "<h3>[Title]</h3>\\n\\n<p>[Text]"
format CL4VList "<br>[link][Howto][/link]"
format SearchVList “
<td valign=top>[link][icon][/link]</td>
(interpreted at <td>{If}{[parent(All': '):Title],
display time) [parent(All': '):Title]: }
[link][Title][/link]</td>"

formatting ™ document text


™ classifiers (separate HList or VList specs)
™ search results
(sensible defaults throughout)

components ™ HTML
™ [Text] displays document text
™ [Title], [Howto] … displays metadata
™ [link] … [/link] links to document
™ [parent] refers to parent document
™ [icon] page or folder icon
™ if statement

List
classifier
(Howto)

<br>[link][Howto][/link]

format CL4VList "<br>[link][Howto][/link]"

classify Hierarchy hfile=sub.txt metadata=Subject sort=Title


classify HDLList metadata=Title
classify Hierarchy hfile=org.txt metadata=Organization sort=Title
classify List metadata=Howto

May 2001
Building DLs using open source software Ian H. Witten

Formatting
search
results
format SearchVList "<td valign=top>[link][icon][/link]</td>
<td>{If}{[parent(All': '):Title],[parent(All': '):Title]: }
[link][Title][/link]</td>"

[link][icon][/link]

[parent(All': '): Title]

™ hierarchically enclosing parents


™ separated by “:”
™ title metadata

[link][Title][/link]

™ use empty string if no parent

Formatting
document
text

format DocumentImages true

format DocumentText "<h3>[Title]</h3>\\n\\n<p>[Text]"

May 2001
Building DLs using open source software Ian H. Witten

creator [email protected]

collect.cfg
maintainer [email protected]
public true

for the
beta true

demo
indexes section:text section:Title document:text
defaultindex section:text

collection plugin GMLPlug


plugin HBPlug
plugin ArcPlug
plugin IndexPlug
plugin RecPlug

classify Hierarchy hfile=sub.txt metadata=Subject sort=Title


classify HDLList metadata=Title
classify Hierarchy hfile=org.txt metadata=Organization sort=Title
classify List metadata=Howto

format SearchVList "<td valign=top>[link][icon][/link]</td>


<td>{If}{[parent(All': '):Title],[parent(All': '):Title]: }
[link][Title][/link]</td>"
format CL4VList "<br>[link][Howto][/link]"
format DocumentImages true
format DocumentText "<h3>[Title]</h3>\\n\\n<p>[Text]"

collectionmeta collectionname "greenstone demo"


collectionmeta collectionextra "This is a demonstration collection for the
Greenstone digital library software.\nIt contains a small
subset (11 books) of the Humanity Development Library"
collectionmeta iconcollectionsmall "/gsdl/collect/demo/images/demosm.gif"
collectionmeta iconcollection "/gsdl/collect/demo/images/demo.gif"
collectionmeta .section:Title "section titles"
collectionmeta .document:text "entire books"
collectionmeta .section:text "chapters"

How to build a digital library


using open source software

Part 1: What Greenstone can do

Part 2: Building a collection


Plugins, classifiers,
format statements

Part 3: Running Greenstone


™ installation
™ collection-building
™ administrative/maintenance pages

May 2001
Building DLs using open source software Ian H. Witten

Admin support
❖ Add new user with … privileges
– E.g. admin, collection-building
❖ Check what collections are available
– Including “private” ones not on home page
❖ Check summary info about a collection
– build date
– collection metadata
– interface language preferences
– number of docs/sections/words/bytes
❖ Logging
– switch on or off
– check user logs—every page access is logged
– check system logs—errors are logged
❖ Notify whenever a new collection is built
❖ Browse technical info about the installation

DL admin

computer
systems
admin

May 2001
Building DLs using open source software Ian H. Witten

May 2001
Building DLs using open source software Ian H. Witten

Installing Greenstone
Windows or Unix?

Windows Unix

Binaries available May need “root”


for all versions login to install

3.x 95/98 NT 2000 Linux Sun Solaris or Other


Macintosh OS/X

Serves collections Full version Full version Full version Full version Full version Full version
but no building available available available available available available

Only “Administrators” Only “Administrators” Source code tested, Source code Untested
can install software can install software binaries available tested

™ PERL 5 for collection-building


™ Web server (e.g. Apache) fastCGI optional
™ GCC for Unix compilation
™ Visual C++ for Windows compilation
™ GDBM (Gnu database manager) (included for Windows)
™ MG, crypt included

Downloading collections
collection abbrev built size download
(Mb) size (Mb)
Arabic demonstration collection arabic 5 4
Bibliothèque pour le développement tulane 492 340
Durable et les Besoins Essentiels
Chinese demonstration collection chinese 1 470
Collection on critical global issues ccgi 160 102
Computer science bibliography csbib 866 112
Computer science technical reports cstr 2010 1800
Food and Nutrition Library fnl 155 98
HCI bibliography hcibib 36 5
Humanity Development Library hdl 199 387
Indigenous Peoples ipc 8 4
Medical and Health Library mhl 142 73
Maori newspapers niupepa 670 659
Oral history ohist 430 421
Project Gutenberg gutenberg 510 427
Sahel point doc unesco 113 78
The computists weekly tcc 22 5
Tidbits magazine tidbits 12 5
United Nations University collection unu 97 71
Virtual Disaster Library paho 110 73
World Environment Library envl 309 220
Women’s history whist 16 12
6363 5366

May 2001
Building DLs using open source software Ian H. Witten

Demo
❖Delete and re-install the software
❖Look around the directory structure
❖Use The Collector to build a
collection from C:\Perl\html\lib\CGI
❖Alter the Demo collection’s config file
change the format of the howto list
❖Browse Maintenance/Admin pages

May 2001
View publication stats

You might also like