0% found this document useful (0 votes)

30 views22 pages

Presentation On: Library Building: Submitted To

This document presents a new framework for building digital library collections called "Greenstone 3". It describes how collections are configured using XML files and how documents are represented using METS. It explains the multi-phase process for building collections, including expansion, recognition, encoding, extraction, classification, indexing, and validation. Key improvements over the previous Greenstone system include support for new open standards and more flexible handling of document formats and metadata.

Uploaded by

Aashik Jayswal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views22 pages

Presentation On: Library Building: Submitted To

Uploaded by

Aashik Jayswal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

SAM HIGGINBOTTOM UNIVERSITY OF AGRICULTURE,

TECHNOLOGY AND SCIENCES , Allahabad , U.P.-211007

PRESENTATION ON: LIBRARY BUILDING

SUBMITTED TO SUBMITTED BY
Ansh Singh
Dr. M. Srivastava 19MSHVS034
M. Sc Horticulture

(Vegetables Science)
SEM:-1ST
Abstract
• This paper introduces a new framework for building digital library
collections and contrasts it with existing systems.
• It describes a radical new step in the development of a widely-used
open-source digital library system, Greenstone, which has evolved
over many years.
• It is sup-ported by a fresh implementation, which forced us to
rethink the entire design rather than making incremental
improvements.
• The redesign capitalizes on the best ideas from the existing system,
which have been refined and developed to open new avenues
through which users can tailor their collections.
• We demonstrate its flexibility by showing how digital library
collections can be extended and altered to satisfy new requirements.
Introduction

• The Greenstone digital library software provides a wide range of tools

for building digital library collections . Based on our own extensive
and varied experience, and that of others , we have designed a new
framework for building digital library collections. We call it
“Greenstone 3” to distinguish it from the earlier system, “Greenstone
2”.
• The new framework is supported by a fresh implementation that is
completely independent of the existing one. It capitalizes on the best
ideas from the existing system, which we have further refined and
developed to open new avenues through which users can tailor their
digital library collections. Several pertinent new open standards have
emerged since the original design many years ago, and a key objective
is to incorporate them into the new design.
Introduction
• The trend towards increasingly open, flexible architectures can
be traced in the development of digital library protocols. It has
provided a simple base-line for metadata access, and
subsequent work strives to base component-based, modular
protocols upon it
• The METS document framework provides an open, extensible
system for representing documents in digital repositories.
Greenstone 3 adopts the same approach for a range of digital
library functions, as this paper demonstrates.
• Set against the move towards standard protocols, today’s
digital library systems must confront an increasing range of
document formats and media, architectural designs for
browsing and classification, indexing requirements, and user
interface techniques.
LIBRARY BUILDING
Collection Configuration

• Collections are designed individually, and the structure of a

collection is encapsulated in an XML file called the “collection
configuration file.” Its contents include build-time configuration
options such as:
• The document types that the collection should recognise
• The metadata access structures or “classifiers” that are to be
provided for users to browse the collection, such as by Titles A–Z
• The full-text indexes to build for searching the collection.
• The configuration file also contains run-time information about
the collections, for example display options.
A collection configuration file
A collection configuration file
METS and Document Representation
• When a document first enters the library, it is given
a unique identifier. That identifier will remain with
the document throughout any subsequent revisions,
and is recorded within the METS framework.

• Having described the configuration controls and

document representation that underpin the new
collection-building architecture, we now describe
the building process itself.
Building Digital Library Collections

In our new architecture, collections are built in several

distinct phases, which occur in sequence.
• Expansion. Compressed files such as Zip archives are
expanded, and links to web sites are expanded into lists
of constituent web pages.
• Recognition. All files are sent to the Recognition
Manager, which identifies groups of files as documents.
• Encoding. All the recognized documents are
catalogued for the subsequent phases of building.
Building Digital Library Collections

• Extraction. Every document is passed through extractors,

which use special processing algorithms to extract
information from the document (e.g. title, key phrases) or
add metadata stored in special files.
• Classification. Documents are assigned to classifiers (e.g.
topical classifiers, ordered list classifiers) depending on their
inherent and extracted metadata.
• Indexation. Documents are sent to indexers to build indexes
that support later searching.
• Validation. Post-building checks are carried out on the
collection as a whole, and on its constituent documents.
Extendibility: Managers and Plugins

• The central role of plugins in each phase should now be apparent.

However, we have made little mention of the structure of plugins
themselves, or how they connect to the core system.
• The architecture achieves extensibility through Plugins and
Managers. Each phase of the building process is controlled by a
Manager— e.g. the Recognizer Manager, the Indexer Manager.
• Managers are configured through the collection configuration
file when the building process starts; and in some cases further
configuration occurs when special files—like metadata-only files
—are found when building. Each manager coordinates the
plugins for its phase of the build cycle.
Advance over earlier work

• The architecture that we have described

capitalizes on lessons learned from the existing
Greenstone digital library system, and
incorporates some very significant improvements.
• At the time the earlier system was designed
(1998) several important open standards did not
exist, or were available only in draft form. For
example, the METS Document Framework, a key
component of the new architecture, was unborn.
Architecture
• In Greenstone 2, collections are constructed in two
phases: importing and building. The first parallels the
Expansion, Recognition and Encoding phases of
•
Greenstone 3,while the second mirrors the
Extraction, Classification and Indexation phases.
Both phases use the same set of plugins, which are
listed in the collection’s configuration file and loaded
separately in each phase—despite the fact that some
plugins only pertain to one phase.
Plugin details

• Document plugins handle Encoding very

diﬀerently in the new design. Originally, all
documents were encoded into the standard
Greenstone Archive Format as soon as they
were encountered. This duplicated content,
and could lose, or render inaccessible, some
information in the original file. The benefit
was that all documents were presented to
subsequent phases in a standardized format.
Example: The Kids Digital Library

• We briefly present an actual digital libraries that was di ﬃcult to

accommodate within the earlier design, and show how it benefits
from the new architecture.
• In the Kids Digital Library each document can belong to several
collections. Some collections are private (e.g. a child’s own
documents), others public. Some documents are unchanging
(accepted final essays); others are under continually revision by
a restricted group of users. Students can annotate the work of
others, and teachers provide feedback too.
• The Kids Digital Library featured some unusual browsing
classifiers such as the “Top ten” and “Latest ten” stories. These
require simple support for feature extraction.
Comparison with other Digital Library Systems

• Cheshire II [3] emphasizes the construction of digital

libraries from original scanned documents.
• The process described for collection building reveals a sys-
tem of fixed metadata fields and a strict control of the
format in which documents are presented to the system.
• Nowhere is support for feature extraction, expansion of
compressed files, or novel indexes described.
• The CORR is built on the NCSTRL software . The CORR
documentation reports that the system requires documents to
be submitted in a standard source format. No documentation
on CORR or NCSTRL describes the parameters for
collection configuration.
Conclusion

• In the new collection building architecture we have

described, the building process is segmented into a
number of distinct phases. Once documents are
identified, they are encoded into a flexible, open
framework (METS) and are passed in that form to the
succeeding phases of the build process.
• Within each phase, the elements are componentized to
support greater portability and simpler development. The
build process is configured through a simple XML format
file which is readily extensible for future components.
References

• J. R. Davis and C. Lagoze. Ncstrl: Design and deployment of a

globally distributed digital library. Journal of the American Society for
Information Science, 51(3):273– 280, 2000.
• Free Software Foundation. GNU make Manual (version 3.80), 2002.
• R. R. Larson and C. Carson. Information access for a digital library:
Cheshire ii and the berkeley environmental digital library. In
Proceedings ASIS ’99, pages 515–535. Information Today, 1999.
• Library of Congress. Metadata Encoding and Transmission Standard
(METS).
• G. W. Paynter, I. H. Witten, S. J. Cunningham, and G. Buchanan.
Scalable browsing for large collections: A case study. In Proceedings of
the Fifth ACM International Conference on Digital Libraries, pages
215–218, June 2000.
References
• C. Sperberg-McQueen and L. Burnard, editors. Guidelines for
Electronic Text Encoding and Interchange. TEI P3 Text Encoding
Initiative, Oxford, 1999.
• H. Suleman and E. A. Fox. Designing protocols in support of digital
library com-ponentization. In Proceedings of the 6th European
Conference on Research and Advanced Technology for Digital
Libraries, pages 568–582. Springer-Verlag, 2002.
• Y. L. Theng, N. Mohd-Nasir, G. Buchanan, B. Fields, H. Thimbleby,
and N. Cas-sidy. Dynamic digital libraries for children. In
Proceedings of the first ACM/IEEE-CS joint conference on Digital
libraries, pages 406–415. ACM Press, 2001.
• I. H. Witten. Examples of practical digital libraries: collections built
internationally using greenstone. D-Lib Magazine, 9(3), 2003.
References

• I. H. Witten and D. Bainbridge. How to build a digital library. Morgan

Kaufmann, San Francisco, CA., 2003.

• I. H. Witten, D. Bainbridge, G. W. Paynter, and S. J. Boddie. Importing

documents and metadata into digital libraries: Requirements analysis and an
extensible architecture. In Proceedings of the European Conference on Digital
Libraries, pages 390–405, Sept. 2002.
• I. H. Witten, A. Moﬀat, and T. C. Bell. Managing gigabytes: compressing and
indexing documents and images. (second edition). Morgan Kaufmann, San
Francisco, CA., 1999.
• I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning.
KEA: Practical automatic keyphrase extraction. In ACM DL, pages 254–255,
1999.
THANK
YOU

Lecture Notes For INFS 328 - System Analysis and Design - Prof Badu
No ratings yet
Lecture Notes For INFS 328 - System Analysis and Design - Prof Badu
370 pages
Digital Libraries Principles and Practices in A Global Environment by Lucy A. Tedd and Andrew Large
0% (1)
Digital Libraries Principles and Practices in A Global Environment by Lucy A. Tedd and Andrew Large
56 pages
BP Nutanix Physical Networking
100% (1)
BP Nutanix Physical Networking
24 pages
Hotel Management
100% (3)
Hotel Management
90 pages
Greenstone: A Comprehensive Open-Source Digital Library Software System
No ratings yet
Greenstone: A Comprehensive Open-Source Digital Library Software System
9 pages
Iimk 'S Experience With Greenstone in Building Digital Library Collections
No ratings yet
Iimk 'S Experience With Greenstone in Building Digital Library Collections
38 pages
Greenstone Digital Library Software ASSIGNMENT
100% (1)
Greenstone Digital Library Software ASSIGNMENT
10 pages
3 Digital Library Architecture and Technology
No ratings yet
3 Digital Library Architecture and Technology
68 pages
Greenstone Mgs Tutorial
No ratings yet
Greenstone Mgs Tutorial
116 pages
05 IWH DB CreatingDL
No ratings yet
05 IWH DB CreatingDL
13 pages
Greenstone: A Comprehensive Open-Source Digital Library Software System
No ratings yet
Greenstone: A Comprehensive Open-Source Digital Library Software System
10 pages
Greenstone Digital Library Software
100% (1)
Greenstone Digital Library Software
51 pages
Greenstone: A Comprehensive Open-Source Digital Library Software System
No ratings yet
Greenstone: A Comprehensive Open-Source Digital Library Software System
9 pages
Developer'S Guide: Greenstone Digital Library
No ratings yet
Developer'S Guide: Greenstone Digital Library
114 pages
Module On Digitization-GREENSTONE
No ratings yet
Module On Digitization-GREENSTONE
22 pages
Building Up A Digital Library With Greenstone: A Self-Instructional Guide For Beginner's
No ratings yet
Building Up A Digital Library With Greenstone: A Self-Instructional Guide For Beginner's
24 pages
How To Build A Digital Library Using Open-Source S
No ratings yet
How To Build A Digital Library Using Open-Source S
45 pages
An Assignment Submitted
No ratings yet
An Assignment Submitted
7 pages
Isc 404 - Lecture 6
No ratings yet
Isc 404 - Lecture 6
4 pages
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Bottlerocket Linux for Container Platforms: The Complete Guide for Developers and Engineers
From Everand
Bottlerocket Linux for Container Platforms: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
ELK Stack Architecture and Operations: Definitive Reference for Developers and Engineers
From Everand
ELK Stack Architecture and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Unit 8
No ratings yet
Unit 8
14 pages
Portal
No ratings yet
Portal
40 pages
Buck Build System in Practice: The Complete Guide for Developers and Engineers
From Everand
Buck Build System in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
OpenEBS for Kubernetes Storage: The Complete Guide for Developers and Engineers
From Everand
OpenEBS for Kubernetes Storage: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Foundations of Digital Libraries
No ratings yet
Foundations of Digital Libraries
26 pages
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Architecture of Digital Library
No ratings yet
Architecture of Digital Library
10 pages
Logstash Made Easy: A Beginner's Guide to Log Ingestion and Transformation
From Everand
Logstash Made Easy: A Beginner's Guide to Log Ingestion and Transformation
Robert Johnson
No ratings yet
Kedro Catalog Essentials: The Complete Guide for Developers and Engineers
From Everand
Kedro Catalog Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
LinuxKit Essentials: The Complete Guide for Developers and Engineers
From Everand
LinuxKit Essentials: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
BitKeeper Essentials: Definitive Reference for Developers and Engineers
From Everand
BitKeeper Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Network File System in Practice: Definitive Reference for Developers and Engineers
From Everand
Network File System in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Software Infrastructure
No ratings yet
Software Infrastructure
8 pages
Linux Container Essentials with LXC: Definitive Reference for Developers and Engineers
From Everand
Linux Container Essentials with LXC: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Sanity.io Structured Content Architecture: The Complete Guide for Developers and Engineers
From Everand
Sanity.io Structured Content Architecture: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Logstash Essentials: Definitive Reference for Developers and Engineers
From Everand
Logstash Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Longhorn for Kubernetes Storage Architecture: The Complete Guide for Developers and Engineers
From Everand
Longhorn for Kubernetes Storage Architecture: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Operational Loki for Log Aggregation: Definitive Reference for Developers and Engineers
From Everand
Operational Loki for Log Aggregation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Oracle Quick Guides: Part 2 - Oracle Database Design
From Everand
Oracle Quick Guides: Part 2 - Oracle Database Design
Malcolm Coxall
No ratings yet
Digital Library
No ratings yet
Digital Library
67 pages
C++ File Handling Step by Step: A Practical Guide with Examples
From Everand
C++ File Handling Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Digital Library
100% (1)
Digital Library
11 pages
Citus for Scalable PostgreSQL Systems: The Complete Guide for Developers and Engineers
From Everand
Citus for Scalable PostgreSQL Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Download
No ratings yet
Download
14 pages
How to Create Agile Library: Build Information Services on Cloud
From Everand
How to Create Agile Library: Build Information Services on Cloud
Manoj Sonawane
No ratings yet
Digital Objects and System Architecture of Digital Libraries
No ratings yet
Digital Objects and System Architecture of Digital Libraries
23 pages
Data Structures I Essentials
From Everand
Data Structures I Essentials
Dennis Smolarski
No ratings yet
Nexus Repository Management and Automation: Definitive Reference for Developers and Engineers
From Everand
Nexus Repository Management and Automation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Version Control with Git for New Developers: A Practical Guide with Examples
From Everand
Version Control with Git for New Developers: A Practical Guide with Examples
William E. Clark
No ratings yet
Couchbase Essentials: Definitive Reference for Developers and Engineers
From Everand
Couchbase Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
InfluxDB Essentials: Definitive Reference for Developers and Engineers
From Everand
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Patterns for Parallel Software Design
From Everand
Patterns for Parallel Software Design
Jorge Luis Ortega-Arjona
No ratings yet
Debian System Essentials: Definitive Reference for Developers and Engineers
From Everand
Debian System Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Liquibase in Practice: Definitive Reference for Developers and Engineers
From Everand
Liquibase in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Container Image Building with BuildKit: Definitive Reference for Developers and Engineers
From Everand
Efficient Container Image Building with BuildKit: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Distributed Storage Networks: Architecture, Protocols and Management
From Everand
Distributed Storage Networks: Architecture, Protocols and Management
Thomas C. Jepsen
No ratings yet
Systemd-nspawn in Practice: Definitive Reference for Developers and Engineers
From Everand
Systemd-nspawn in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
IPFS Protocol Engineering: Definitive Reference for Developers and Engineers
From Everand
IPFS Protocol Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
FINAL REPORT: Infrastructure: A. Introduction/Proposed Solutions
No ratings yet
FINAL REPORT: Infrastructure: A. Introduction/Proposed Solutions
16 pages
Amazon Case
No ratings yet
Amazon Case
2 pages
Python & Excel Automation Cheat Sheet
No ratings yet
Python & Excel Automation Cheat Sheet
5 pages
Va05 Enhancement
No ratings yet
Va05 Enhancement
9 pages
HPE Aruba Networking CX 8100 Switch Series Data sheet-PSN1014733547PHEN
No ratings yet
HPE Aruba Networking CX 8100 Switch Series Data sheet-PSN1014733547PHEN
4 pages
Higher Education's Top 10 Strategic Technologies For 2016: Educause Center For Analysis and Research
No ratings yet
Higher Education's Top 10 Strategic Technologies For 2016: Educause Center For Analysis and Research
55 pages
Is2109 2024
No ratings yet
Is2109 2024
11 pages
08 - AWS Cloud Security and Access Management
No ratings yet
08 - AWS Cloud Security and Access Management
19 pages
MTech Cyber I and II Sem Syllabus
No ratings yet
MTech Cyber I and II Sem Syllabus
24 pages
A Complete Guide To Software
No ratings yet
A Complete Guide To Software
21 pages
rdb1 ws0910 v2 2x3 PDF
No ratings yet
rdb1 ws0910 v2 2x3 PDF
14 pages
Que Stio N No Type (MC Q/SAT) CO Mapp Ing Answer Key
No ratings yet
Que Stio N No Type (MC Q/SAT) CO Mapp Ing Answer Key
24 pages
Camera User Manual en
No ratings yet
Camera User Manual en
12 pages
Jupyter Notebook
No ratings yet
Jupyter Notebook
97 pages
CURD
No ratings yet
CURD
26 pages
A Proposal To Implement Integrated Gis System in D
No ratings yet
A Proposal To Implement Integrated Gis System in D
12 pages
What Is Spoofing
No ratings yet
What Is Spoofing
4 pages
Mod 2 Business Analytics
No ratings yet
Mod 2 Business Analytics
43 pages
Big Data and Data Visualization
No ratings yet
Big Data and Data Visualization
90 pages
James Eugene Atinda Resume
No ratings yet
James Eugene Atinda Resume
5 pages
Chapter 5 Review Questions
No ratings yet
Chapter 5 Review Questions
4 pages
TrainSmart AI
No ratings yet
TrainSmart AI
2 pages
12th Computer Applications Most Important Final 2023
No ratings yet
12th Computer Applications Most Important Final 2023
4 pages
Unit 4 - Distributed System - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Distributed System - WWW - Rgpvnotes.in
11 pages
Symantec SSL Visibility Appliance: Data Sheet
No ratings yet
Symantec SSL Visibility Appliance: Data Sheet
6 pages
Lec 1 CAD - CAM Introduction
No ratings yet
Lec 1 CAD - CAM Introduction
26 pages
Temenos T24 CSS Fault Reports - Customer Support Service How To' Guide
No ratings yet
Temenos T24 CSS Fault Reports - Customer Support Service How To' Guide
11 pages
GHRCEM, Pune NCS Student Nominations
No ratings yet
GHRCEM, Pune NCS Student Nominations
33 pages

Presentation On: Library Building: Submitted To

Uploaded by

Presentation On: Library Building: Submitted To

Uploaded by

SAM HIGGINBOTTOM UNIVERSITY OF AGRICULTURE,

TECHNOLOGY AND SCIENCES , Allahabad , U.P.-211007

PRESENTATION ON: LIBRARY BUILDING

• The Greenstone digital library software provides a wide range of tools

• Collections are designed individually, and the structure of a

• Having described the configuration controls and

In our new architecture, collections are built in several

• Extraction. Every document is passed through extractors,

• The central role of plugins in each phase should now be apparent.

• The architecture that we have described

• Document plugins handle Encoding very

• We briefly present an actual digital libraries that was di ﬃcult to

• Cheshire II [3] emphasizes the construction of digital

• In the new collection building architecture we have

• J. R. Davis and C. Lagoze. Ncstrl: Design and deployment of a

• I. H. Witten and D. Bainbridge. How to build a digital library. Morgan

• I. H. Witten, D. Bainbridge, G. W. Paynter, and S. J. Boddie. Importing

You might also like