How to Build a Digital Library Using Open-source s
How to Build a Digital Library Using Open-source s
net/publication/220924005
CITATIONS READS
4 16,066
1 author:
Ian Witten
University of Waikato
560 PUBLICATIONS 104,908 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ian Witten on 04 November 2014.
Ian H. Witten
Computer Science Department
Waikato University
New Zealand
http://nzdl.org/
Agenda
May 2001
Building DLs using open source software Ian H. Witten
❖ Configuring a collection
May 2001
Building DLs using open source software Ian H. Witten
Greenstone software
Collections “Library” = set of separate collections
“Collection” = set of separate documents
Multigigabyte collections
Collections:
on the Web
nzdl.org
(demo, not
service)
May 2001
Building DLs using open source software Ian H. Witten
Example
May 2001
Building DLs using open source software Ian H. Witten
May 2001
Building DLs using open source software Ian H. Witten
Metadata
Dublin core, for searching and browsing
Multimedia
Picture, Voice, Music
Multi-language documents
Maori, French, Arabic, Chinese, …
Multi-language interfaces
French, Chinese, …
Hierarchical
document model
A book
May 2001
Building DLs using open source software Ian H. Witten
Metadata
specified at
any level
A
bibliography
collection
French documents
+ French interface
UNESCO,
Paris
May 2001
Building DLs using open source software Ian H. Witten
Arabic documents
+ English interface
Chinese documents
(pictures of text)
+ Chinese interface
May 2001
Building DLs using open source software Ian H. Witten
Chinese documents
+ Chinese interface
Acronym extraction
plugin
May 2001
Building DLs using open source software Ian H. Witten
Language
identification plugin
Email plugin
May 2001
Building DLs using open source software Ian H. Witten
Searching:
multiple indexes
(editor chooses)
text
metadata
May 2001
Building DLs using open source software Ian H. Witten
Different collection:
different indexes
metadata
Ranked OR
Boolean AND
May 2001
Building DLs using open source software Ian H. Witten
Full Boolean
queries, plus
other search
preferences
Multilingual
searching
(Unicode)
May 2001
Building DLs using open source software Ian H. Witten
Browsing:
different
“classifier”
types
List
classifier
(Howto
metadata)
AZList
classifier
(Title
metadata)
May 2001
Building DLs using open source software Ian H. Witten
DateList
classifier
(Date
metadata)
Custom-made
classifier
(Title
metadata)
Simple variant
of AZList (2
lines of PERL)
May 2001
Building DLs using open source software Ian H. Witten
Hierarchy
classifier
(Subject
metadata)
Multilevel
hierarchy
Information
specified in
auxiliary file
Multilevel
hierarchy
May 2001
Building DLs using open source software Ian H. Witten
Multilevel
hierarchy
Plugins
format-specific parsing of source documents
(and metadata specs)
pipeline: files are passed to each plugin in turn
~ a dozen plugins (TEXT, HTML, EMAIL, WORD,
RTF, PDF, PS …)
Classifiers
Create browsing indexes
VLists, HLists, and DateLists
Hierarchical structure of lists
List
one-level: single VList
two-level: HList and Vlist
(e.g. SectionList, AZList, DateList)
May 2001
Building DLs using open source software Ian H. Witten
Collection creator
maintainer
[email protected]
[email protected]
configuration public
beta
true
true
file indexes
defaultindex
section:text section:Title document:text
section:text
plugins
classify HDLList metadata=Title
classify Hierarchy hfile=org.txt metadata=Organization sort=Title
classifiers
classify List metadata=Howto
classifiers
collectionmeta collectionextra "This is a demonstration collection for the
Greenstone digital library software.\nIt contains a small
subset (11 books) of the Humanity Development Library"
collectionmeta iconcollectionsmall "/gsdl/collect/demo/images/demosm.gif"
collectionmeta iconcollection "/gsdl/collect/demo/images/demo.gif"
collectionmeta .section:Title "section titles"
collectionmeta .document:text "entire books"
collectionmeta .section:text "chapters"
May 2001
Building DLs using open source software Ian H. Witten
Greenstone DL software
Access 9 Accessible via any Web browser
9 Server runs on Windows and Unix
9 Collections can be published on CD-ROM
May 2001
Building DLs using open source software Ian H. Witten
May 2001
Building DLs using open source software Ian H. Witten
May 2001
Building DLs using open source software Ian H. Witten
May 2001
Building DLs using open source software Ian H. Witten
Status
updated
every 5
secs
May 2001
Building DLs using open source software Ian H. Witten
indexes document:text
defaultindex document:text
plugin ZIPPlug
plugin GMLPlug
plugin TEXTPlug
plugin HTMLPlug –file_is_url
plugin EMAILPlug
plugin ArcPlug
plugin RecPlug
Alter configuration
Add full-text index of titles ... indexes document:Title
additional indexes line
May 2001
Building DLs using open source software Ian H. Witten
<gsdlsection>
<metadata>
<gsdlsourcefilename> uu02fe.txt </gsdlsourcefilename>
<gsdldoctype> indexed_doc </ gsdldoctype>
<Identifier> HASHa723e7e164df07c833bfc4 </Identifier>
<Title> Freshwater Resources in Arid Lands </Title>
<gsdlassocfile> cover.jpg:image/jpeg </gsdlassocfile>
<gsdlassocfile> p21.jpg:image/jpeg </gsdlassocfile>
<gsdlassocfile> p22.jpg:image/jpeg </gsdlassocfile>
</metadata>
</gsdlsection>
May 2001
Building DLs using open source software Ian H. Witten
<gsdlsection>
<metadata> ... </metadata>
<gsdlsection>
<metadata>
<gsdlnum> 1 </gsdlnum>
<Title> Preface </Title>
</metadata>
This is the text of the preface
</gsdlsection>
</gsdlsection>
<gsdlsection>
<metadata>
<gsdlnum> 2 </gsdlnum>
<Title> Conclusions </Title>
</metadata>
<gsdlsection>
<metadata>
<gsdlnum> 1 </gsdlnum>
<Title> Part 1 </Title>
</metadata>
May 2001
Building DLs using open source software Ian H. Witten
<gsdlsection>
<metadata> ... Conclusions </metadata>
<gsdlsection>
<metadata> ... Part 1 </metadata> ... </gsdlsection>
<gsdlsection>
<metadata>
<gsdlnum> 2 </gsdlnum>
<Title> Part 2 </Title>
</metadata>
</gsdlsection>
</gsdlsection>
creator [email protected]
collect.cfg
maintainer [email protected]
public true
for the
beta true
demo
indexes section:text section:Title document:text
defaultindex section:text
May 2001
Building DLs using open source software Ian H. Witten
document, “section”
or paragraph
text or any
metadata
indexes section:text
section:Title
document:text
defaultindex section:text
creator [email protected]
collect.cfg
maintainer [email protected]
public true
for the
beta true
demo
indexes section:text section:Title document:text
defaultindex section:text
May 2001
Building DLs using open source software Ian H. Witten
Plugin pipeline: files are passed to each plugin in turn until one
is found that can process it
May 2001
Building DLs using open source software Ian H. Witten
import/index.txt
Used by IndexPlug
to add metadata to .gml files
Filename Subject metadata
Organization metadata
Howto metadata Magazine metadata
key
line
key: Subject Organization Howto Magazine
bostid/b22bue 16.11 bostid "start a butterfly farm"
faobetf/fb33fe 14.12 faobfs <Subject>16.11
faobetf/fb34fe 14.12 faobfs "farm snails" <Subject>16.11
bostid/b18ase 16.11 bostid "introduce little-known Asian farm animals with
a promising future"
bostid/b20cre 16.11 bostid
bostid/b17mie 16.11 bostid "introduce small animals and micro-livestock in
your farm"
bostid/b21wae 16.5 bostid "utilize the Water Buffalo more effectively"
<Subject>16.11
ecourier/ec158e 23.15 ecc <Subject>8.1 "<Magazine>The Courier"
ecourier/ec159e 23.15 ecc <Subject>6.1 "<Magazine>The Courier"
ecourier/ec160e 23.15 ecc <Subject>21.1 "<Magazine>The Courier"
wb/wb34te 6.4 wb "achieve gender equality"
Hierarchy
hierarchical structure of Vlists, Hlists, and Datelists
List
one-level hierarchy consisting of a single Vlist
two-level hierarchy consisting of a Hlist and Vlist
¾ SectionList
¾ AZList
¾ AZSectionList
¾ DateList
May 2001
Building DLs using open source software Ian H. Witten
List
classifier
(Howto)
May 2001
Building DLs using open source software Ian H. Witten
AZList
classifier
(Title)
AZList
classifier
(Title) for
the demo
collection
May 2001
Building DLs using open source software Ian H. Witten
DateList
classifier
HDLList
classifier
(Title)
May 2001
Building DLs using open source software Ian H. Witten
etc/mags.txt
identifier (matches Magazine metadata value)
position in browsing hierarchy
title for hierarchy browser
CERES 1 CERES
"Food and Nutrition Bulletin" 2 "Food and Nutrition Bulletin"
"The Courier" 3 "The Courier"
"SPORE Bulletin" 4 "SPORE Bulletin"
"Boiling Point" 5 "Boiling Point"
"Developing Ideas" 6 "Developing Ideas"
"GATE Magazine" 7 "GATE Magazine"
"Go Between" 8 "Go Between"
"Basin - News" 9 "Basin - News"
etc/org.txt
identifier (matches Organization metadata value)
position in the hierarchy that the browser implements
title to be displayed in hierarchy browser
accu 1 ACCU
ag21 2 "Agenda 21"
bgz 3 "BASIN - GTZ - SKAT"
bostid 4 BOSTID
cps 5 CPS
cpas 6 "CTA Spore"
cf 7 "Commonwealth Foundation"
csu 8 "Computer Science Unplugged"
dcs 9 "Development Consultancy Services"
ecc 11 "EC Courier"
ecdg8 12 "EC DG8"
echo 13 "Educational Concerns for Hunger Organization"
fao 14 FAO
faobfs 15 "FAO Better Farming series"
ceres 16 "FAO Ceres"
ff 17 "Food First"
May 2001
Building DLs using open source software Ian H. Witten
Hierarchy
classifier
(Organization)
Hierarchy
classifier
(Organization)
May 2001
Building DLs using open source software Ian H. Witten
1 1 "General reference"
1.2 1.2 "Dictionaries, glossaries, language courses, terminology (all languages)"
2 2 "Sustainable Development, International cooperation, Projects; NGO,
Organizations, Poverty and Hunger Alleviation, Basic Human Needs"
2.1 2.1 "Development policy and theory, international cooperation, national planning,
national plans"
2.2 2.2 "Development, national planning, national plans"
2.3 2.3 "Project planning and evaluation (incl. project management and dissemination
strategies)"
2.4 2.4 "Regional development and planning incl. regional profiles"
2.5 2.5 "Nongovernmental organizations (NGOs) in general, self- help organizations
(their role in development). For specific organizations such as women's or peasants' see
respective class within 06."
2.6 2.6 "Organizations, institutions, United Nations (general, directories,
yearbooks, annual reports, etc.)"
2.6.1 2.6.1 "United Nations"
2.6.2 2.6.2 "International organizations"
2.6.3 2.6.3 "Regional organizations"
2.6.5 2.6.5 "European Community - European Union"
2.7 2.7 "Sustainable Development, Development models and examples;
Hierarchy
classifier
(Subject)
1 1 "General reference"
1.2 1.2 "Dictionaries, glossaries, language courses, terminology (all languages)"
2 2 "Sustainable Development, International cooperation, Projects; NGO,
Organizations, Poverty and Hunger Alleviation, Basic Human Needs"
2.1 2.1 "Development policy and theory, international cooperation, national planning,
national plans"
2.2 2.2 "Development, national planning, national plans"
2.3 2.3 "Project planning and evaluation (incl. project management and dissemination
strategies)"
2.4 2.4 "Regional development and planning incl. regional profiles"
2.5 2.5 "Nongovernmental organizations (NGOs) in general, self- help organizations
(their role in development). For specific organizations such as women's or peasants' see
respective class within 06."
2.6 2.6 "Organizations, institutions, United Nations (general, directories,
yearbooks, annual reports, etc.)"
2.6.1 2.6.1 "United Nations"
2.6.2 2.6.2 "International organizations"
2.6.3 2.6.3 "Regional organizations"
2.6.5 2.6.5 "European Community - European Union"
2.7 2.7 "Sustainable Development, Development models and examples;
May 2001
Building DLs using open source software Ian H. Witten
Hierarchy
classifier
(Subject)
1 1 "General reference"
1.2 1.2 "Dictionaries, glossaries, language courses, terminology (all languages)"
2 2 "Sustainable Development, International cooperation, Projects; NGO,
Organizations, Poverty and Hunger Alleviation, Basic Human Needs"
2.1 2.1 "Development policy and theory, international cooperation, national planning,
national plans"
2.2 2.2 "Development, national planning, national plans"
2.3 2.3 "Project planning and evaluation (incl. project management and dissemination
strategies)"
2.4 2.4 "Regional development and planning incl. regional profiles"
2.5 2.5 "Nongovernmental organizations (NGOs) in general, self- help organizations
(their role in development). For specific organizations such as women's or peasants' see
respective class within 06."
2.6 2.6 "Organizations, institutions, United Nations (general, directories,
yearbooks, annual reports, etc.)"
2.6.1 2.6.1 "United Nations"
2.6.2 2.6.2 "International organizations"
2.6.3 2.6.3 "Regional organizations"
2.6.5 2.6.5 "European Community - European Union"
2.7 2.7 "Sustainable Development, Development models and examples;
Hierarchy
classifier
(Subject)
1 1 "General reference"
1.2 1.2 "Dictionaries, glossaries, language courses, terminology (all languages)"
2 2 "Sustainable Development, International cooperation, Projects; NGO,
Organizations, Poverty and Hunger Alleviation, Basic Human Needs"
2.1 2.1 "Development policy and theory, international cooperation, national planning,
national plans"
2.2 2.2 "Development, national planning, national plans"
2.3 2.3 "Project planning and evaluation (incl. project management and dissemination
strategies)"
2.4 2.4 "Regional development and planning incl. regional profiles"
2.5 2.5 "Nongovernmental organizations (NGOs) in general, self- help organizations
(their role in development). For specific organizations such as women's or peasants' see
respective class within 06."
2.6 2.6 "Organizations, institutions, United Nations (general, directories,
yearbooks, annual reports, etc.)"
2.6.1 2.6.1 "United Nations"
2.6.2 2.6.2 "International organizations"
2.6.3 2.6.3 "Regional organizations"
2.6.5 2.6.5 "European Community - European Union"
2.7 2.7 "Sustainable Development, Development models and examples;
May 2001
Building DLs using open source software Ian H. Witten
strings
format DocumentText "<h3>[Title]</h3>\\n\\n<p>[Text]"
format CL4VList "<br>[link][Howto][/link]"
format SearchVList “
<td valign=top>[link][icon][/link]</td>
(interpreted at <td>{If}{[parent(All': '):Title],
display time) [parent(All': '):Title]: }
[link][Title][/link]</td>"
components HTML
[Text] displays document text
[Title], [Howto] … displays metadata
[link] … [/link] links to document
[parent] refers to parent document
[icon] page or folder icon
if statement
List
classifier
(Howto)
<br>[link][Howto][/link]
May 2001
Building DLs using open source software Ian H. Witten
Formatting
search
results
format SearchVList "<td valign=top>[link][icon][/link]</td>
<td>{If}{[parent(All': '):Title],[parent(All': '):Title]: }
[link][Title][/link]</td>"
[link][icon][/link]
[link][Title][/link]
Formatting
document
text
May 2001
Building DLs using open source software Ian H. Witten
creator [email protected]
collect.cfg
maintainer [email protected]
public true
for the
beta true
demo
indexes section:text section:Title document:text
defaultindex section:text
May 2001
Building DLs using open source software Ian H. Witten
Admin support
❖ Add new user with … privileges
– E.g. admin, collection-building
❖ Check what collections are available
– Including “private” ones not on home page
❖ Check summary info about a collection
– build date
– collection metadata
– interface language preferences
– number of docs/sections/words/bytes
❖ Logging
– switch on or off
– check user logs—every page access is logged
– check system logs—errors are logged
❖ Notify whenever a new collection is built
❖ Browse technical info about the installation
DL admin
≠
computer
systems
admin
May 2001
Building DLs using open source software Ian H. Witten
May 2001
Building DLs using open source software Ian H. Witten
Installing Greenstone
Windows or Unix?
Windows Unix
Serves collections Full version Full version Full version Full version Full version Full version
but no building available available available available available available
Only “Administrators” Only “Administrators” Source code tested, Source code Untested
can install software can install software binaries available tested
Downloading collections
collection abbrev built size download
(Mb) size (Mb)
Arabic demonstration collection arabic 5 4
Bibliothèque pour le développement tulane 492 340
Durable et les Besoins Essentiels
Chinese demonstration collection chinese 1 470
Collection on critical global issues ccgi 160 102
Computer science bibliography csbib 866 112
Computer science technical reports cstr 2010 1800
Food and Nutrition Library fnl 155 98
HCI bibliography hcibib 36 5
Humanity Development Library hdl 199 387
Indigenous Peoples ipc 8 4
Medical and Health Library mhl 142 73
Maori newspapers niupepa 670 659
Oral history ohist 430 421
Project Gutenberg gutenberg 510 427
Sahel point doc unesco 113 78
The computists weekly tcc 22 5
Tidbits magazine tidbits 12 5
United Nations University collection unu 97 71
Virtual Disaster Library paho 110 73
World Environment Library envl 309 220
Women’s history whist 16 12
6363 5366
May 2001
Building DLs using open source software Ian H. Witten
Demo
❖Delete and re-install the software
❖Look around the directory structure
❖Use The Collector to build a
collection from C:\Perl\html\lib\CGI
❖Alter the Demo collection’s config file
change the format of the howto list
❖Browse Maintenance/Admin pages
May 2001
View publication stats