0% found this document useful (0 votes)
152 views42 pages

Using and To Create XML Standards-Based Digital Library Applications

Uploaded by

Bala Chuppala
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views42 pages

Using and To Create XML Standards-Based Digital Library Applications

Uploaded by

Bala Chuppala
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 42

Using <METS> and <MODS>

to Create XML Standards-based


Digital Library Applications

Morgan Cundiff & Nate Trail


Network Development and MARC Standards Office (NDMSO)
Library of Congress
XML is the lingua franca of the Web
» Web pages increasingly use XHTML
» Business use for data exchange/ messaging
» Family of technologies can be leveraged
• XML Schema, XSLT, XPath, and XQuery
» Software tools widely available (open source)
• Storage, editing, parsing, validating, transforming and
publishing XML – constantly and actively improved
» Microsoft Office 2003 supports XML as document
format (WordML and ExcelML)
» Web 2.0 applications based on XML (AJAX,
Semantic Web, Web Services, etc.)
XML (Extensible Markup Language)
“XML has become the de-facto standard
for representing metadata descriptions
of resources on the Internet.”

Dr. Jane Hunter


University of Queensland, Australia
Working towards MetaUtopia – A
Survey of Current Metadata Research
Interoperability and Standards
“In moving from dispersed digital collections to
interoperable digital libraries, the most important
activity we need to focus on is standards… most
important is the wide variety of metadata standards
[including] descriptive metadata… administrative
metadata…, structural metadata, and terms and
conditions metadata…”

Dr. Howard Besser, New York University


The Next Stage: Moving from Isolated Digital
Collections to Interoperable Digital Libraries
XML and Digital Libraries
» Family of XML data standards
• METS – Metadata Encoding and Transmission Standard
• MODS – Metadata Object Description Schema
• MIX – Metadata for Images in XML
• PREMIS – PREservation Metadata Implementation
Strategies
• TEI – Text Encoding Initiative
• EAD – Encoded Archival Description
XML and Digital Libraries
» METS Implementors
• Library of Congress, OCLC, RLG, California Digital Library
(CDL), Harvard, Princeton, National Library of Portugal,
National Library of Wales, University of Indiana, Stanford,
New York University, University of Göttingen, Oxford
University, and more …
» METS Software Tools
• METS Toolkit & DRS METS Archive Tool (Dmart) for Audio
Deposit (Harvard), 7train METS Generation Tool (CDL),
MEX Authoring Tools (Das Bundesarchiv), ContentE
(Biblioteca Nacional Digital, Portugal), METS Navigator
(Indiana University DL Program) ResCarta Metadata
Creation Tool (ResCarta Foundation), and more …
» METS listserv: 550 subscribers
XML at LC: A Historical Perspective
» 1995 – American Memory released (not XML-based)
» 1998 – XML 1.0 becomes W3C Recommendation
» 2002 – METS and MODS released
» 2002 – Digital Audio-Visual Preservation Prototyping
Project (first use of METS, MODS, and MIX at LC)
» 2003 – Patriotic Melodies (first use of METS and
MODS in production at LC – this is later added to
I Hear American Singing)
» 2003 – Veterans History Project database released,
MINERVA project (MODS)
continued…
XML at LC: A Historical Perspective
» 2004 – I Hear America Singing released (since
renamed to LC Presents)
» 2004 – Justice Blackmun Papers collection released
» 2006 – National Digital Newspaper Project as
repository submission package at LC (LC and
partners, 1st use of METS, MODS, MIX, PREMIS)
» 2006 – Ser2Dig (Digital Serials workgroup, METS for
multi-volume monographs)
» 2006 – Draft METS profile for “article-level” historical
newspapers
What is METS?
» Metadata Encoding and Transmission Standard
» An XML Schema for the purpose of creating XML
document instances that express…
• the hierarchical structure of digital library objects
• the names and locations of the files that comprise the
digital object
• the associated metadata (e.g., MODS)
» METS can be used as a tool for modeling real
world objects, such as specific document types
What is MODS?
» Metadata Object Description Schema
» An XML Schema designed for expressing
bibliographic data
• Can be viewed as an alternative to the MARC format
• Especially useful for XML-based digital library projects
• Can be used as an extension schema to METS
What is MODS?
» Metadata Object Description Schema
» An XML Schema designed for expressing
bibliographic data
• Can be viewed as an alternative to the MARC format
• Especially useful for XML-based digital library projects
• Can be used as an extension schema to METS
» Note to catalogers: MODS does not make you
obsolete! The same knowledge and skills needed for
traditional cataloging (AACR, controlled vocabularies, etc.)
still apply. You will only need to learn a different syntax (i.e.,
different from MARC) for expressing bibliographic information
in machine-readable form.
Structure of METS
» There are 7 sections in a METS document
<mets>
<metsHdr/> - METS header (document talks about itself)
<dmdSec/> - Descriptive metadata (MODS, etc.)
<amdSec/> - Administrative metadata (copyright info., etc.)
<fileSec/> - File section (names and locations of files)
<structMap/> - Structural map (relationships of the parts)
<structLink/> - Linking information
<behaviorSec/> - Binding executables/actions to object
</mets>
Wrap Descriptive Metadata in METS
» Use <mdWrap> to embed descriptive metadata
within a METS document
<mets> Metadata wrap section acts as
… “socket” to hold metadata from
<dmdSec> other XML schemas or
<mdWrap> “vocabularies”
<xmlData>
<!-- insert metadata from different namespace here -->
</xmlData>
</mdWrap>
</dmdSec>

</mets>
<dmdSec> with MODS Extension Schema
<mets:mets> Descriptive metadata
… section
<mets:dmdSec> MODS data contained
<mets:mdWrap> inside the metadata
<mets:xmlData> wrap section
<mods:mods></mods:mods>
</mets:xmlData>
</mets:mdWrap> Use of prefixes before element
</mets:dmdSec> names to identify schema

</mets:mets>
<dmdSec> with <mods:relatedItem>
<mets:mets>
The MODS releatedItem

element can be nested
<mets:dmdSec>
and can be used to
<mets:mdWrap>
<mets:xmlData>
express a hierarchy.
<mods:mods>
<mods:relatedItem type=“constituent”>
<mods:relatedItem type=“constituent”></mods:relatedItem>
</mods:relatedItem>
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>

</mets:mets>
<mods:mods>
<mods:titleInfo>
<mods:title>Bernstein conducts Beethoven </mods:title>
</mods:titleInfo>
<mods:name>
<mods:namePart>Bernstein, Leonard</mods:namePart>
</mods:name>
<mods:relatedItem type="constituent">
<mods:titleInfo>
<mods:title>Symphony No. 5</mods:title>
</mods:titleInfo>
<mods:name>
<mods:namePart>Beethoven, Ludwig van</mods:namePart>
</mods:name>
<mods:relatedItem type="constituent">
<mods:titleInfo>
<mods:partName>Allegro con moto</mods:partName>
</mods:titleInfo>
</mods:relatedItem>
<mods:relatedItem type="constituent">
<mods:titleInfo>
<mods:partName>Adagio</mods:partName>
</mods:titleInfo>
</mods:relatedItem>
</mods:relatedItem>
</mods:mods>
MODS relatedItem type=“constituent”
» Child element to MODS
» relatedItem element uses MODS content model
• titleInfo, name, subject, physicalDescription, note, etc.
» Makes it possible to create rich analytics for
contained works within a MODS record
» Repeatable and nestable recursively
• Making it possible to build a hierarchical tree structure
» Makes it possible to associate descriptive data
with any structural element
METS 2 Hierarchies: Logical & Physical
<mets:mets>
<mets:dmdSec> Hierarchy to represent
<mets:mdWrap> “logical” structure (nested
<mets:xmlData> relatedItems)
<mods:mods>
<mods:relatedItem>
<mods:relatedItem></mods:relatedItem>
</mods:relatedItem>
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:fileSec></mets:fileSec>
<mets:structMap> Hierarchy to represent
<mets:div>
<mets:div></mets:div> “physical” structure (nested
</mets:div> div elements)
</mets:structMap>
</mets:mets>
Linking in METS Documents
(XML ID/IDREF links)
DescMD
mods
AdminMD relatedItem
techMD relatedItem
sourceMD
digiprovMD
rightsMD

fileGrp
file
file
StructMap
div
div
fptr
div
fptr
Linking in METS Documents
(XML ID/IDREF links)
DescMD
mods
AdminMD relatedItem
techMD relatedItem
sourceMD
digiprovMD
rightsMD

fileGrp
file
file
StructMap
div
div
fptr
div
fptr
Linking in METS Documents
(XML ID/IDREF links)
DescMD
mods
AdminMD relatedItem
techMD relatedItem
sourceMD
digiprovMD
rightsMD

fileGrp
file
file
StructMap
div
div
fptr
div
fptr
Linking in METS Documents
(XML ID/IDREF links)
DescMD
mods
AdminMD relatedItem
techMD (mix) relatedItem
sourceMD
digiprovMD
rightsMD

fileGrp
file
file
StructMap
div
div
fptr
div
fptr
Linking in METS Documents
(XML ID/IDREF links)
DescMD
mods
AdminMD relatedItem
techMD (mix) relatedItem
sourceMD
digiprovMD
rightsMD

fileGrp
file
file
StructMap
div
div
fptr
div
fptr
Linking in METS Documents
(XML ID/IDREF links)
DescMD
mods
AdminMD relatedItem
techMD (mix) relatedItem
sourceMD
digiprovMD
rightsMD

fileGrp
file
file
StructMap
div
div
fptr
div
fptr
What is a METS Profile?
» Description of a class of METS documents
• provides document authors and programmers guidance to
create and process conformant METS documents
» XML document using a schema
• Expresses the requirements that a METS document must
satisfy
» “Data standard” in its own right
• A sufficiently explicit METS Profile may be considered a
“data standard”
» METS Profiles are human-readable prose and not
intended to be “machine actionable”
METS Profile Excerpt
» Recorded Event – structMap requirement
METS Profiles Used in LC Presents
» Sheet Music
» Musical Score (score, score and parts, or a set of parts only)
» Print Material (books, pamphlets, etc)
» Music Manuscript (score or sketches)
» Recorded Event (audio or video)
» PDF Document
» Bibliographic Record
» Photograph
» Compact Disc
» Collection
Multiple Inputs to Common Data Format

New Digital Legacy


Objects Database

Harvest of American
A common data Memory Objects
Profile-based
format for searching METS
and display Object
Example 1: New Digital Object
» METS Musical Score Profile
» Library of Congress March
by John Philip Sousa
» Musical score and parts
Example 2: New Digital Object
» METS Recorded Event Profile
» Juilliard String Quartet
» Sound Recording
Example 3: Legacy Database
» METS Bibliographic Record Profile
» Duke Ellington & His Orchestra
(1962) [Motion Picture]
» Bibliographic Information
Convert database from Filemaker
Pro to a single XML file.
XSLT stylesheet creates 14,000
METS/MODS records.
XSL-FO stylesheet creates single
PDF document.
Example 4: American Memory Harvest
» METS Photograph Profile
» William P. Gottlieb Collection
Portrait of Louis Armstrong
» Photographic object

Convert file of 1600 MARC records,


using marc4j, to XML
modsCollection (single file).
Used XSLT stylesheet to create 1600
records conforming to the
METS photograph profile.
Logical & Physical Relationships
Logical (MODS) div TYPE=“photo:version” elements
correspond to the 3 nodes using a logical
<mods:mods ID="ver01"> sequence of ID to DMDID relationships
<mods:titleInfo>
<mods:title>Original Work</mods:title> Physical (METS structMap)
</mods:titleInfo>
<mods:relatedItem type="otherVersion" ID="ver02"> <mets:structMap>
<mets:div TYPE="photo:photoObject“
<mods:titleInfo>
DMDID="MODS1">
<mods:title>Derivative Work 1</mods:title>
<mets:div TYPE="photo:version" DMDID="ver01">
</mods:titleInfo>
<mets:div TYPE="photo:image">
</mods:relatedItem>
<mets:fptr FILEID="FN10081"/>
<mods:relatedItem type="otherVersion" ID="ver03"> </mets:div>
<mods:titleInfo> </mets:div>
<mods:title>Derivative Work 2</mods:title> <mets:div TYPE="photo:version" DMDID=“ver02">
</mods:titleInfo> <mets:div TYPE="photo:image">
</mods:relatedItem> <mets:fptr FILEID="FN10090"/>
</mods:mods> </mets:div>
<mets:div TYPE="photo:version" DMDID="ver03">
<mets:div TYPE="photo:image">
mods:mods and <mets:fptr FILEID="FN1009F"/>
</mets:div>
mods:relatedItem type ="otherVersion" </mets:div>
elements create a sequence of 3 nodes </mets:div>
</mets:div>
</mets:structMap>
Validation in METS Profiles
» 3 levels of validation for METS objects
» Validation of XML (well-formed)
» Validation of METS/MODS (XML Schema)
» Validation of METS Profile
Example 1: Aggregation
» METS Song Collection Object
» Hierarchy of METS documents
Collection members
include sheet music,
an audio recording, a
manuscript, and a
biography of the
composer.
Example 2: Aggregation
» MODS relatedItem type=“host”
» memberOf:Baseball sheet music

Objects can be related


to a virtual aggregate
– in this case
“Baseball sheet
music”
Example 3: Aggregation
» “See also” reference
» MODS relatedItem (no type)
Example: Administrative Metadata
» PREMIS and MIX for digital images
Software/Tools for METS/MODS
» Emacs – text editor (used to edit MODS)
» nxml-mode – plug-in for schema-aware XML editing
» XML Schemas for METS, MODS, MIX, PREMIS
Software/Tools for METS/MODS
» cygwin – bash shell command line and tools
» Saxon – XSLT transformations
» Xerces – XML validation
» mysql-jdbc-connector – connect to mySQL
» SRU – retrieve records from ILS
» Cocoon – facilities to retrieve and load records,
retrieve xml version of a file system, etc.
» Ant – used to automate all of the above tasks and
create pipelines of multiple tasks (runs from Emacs)
continued…
Advantages of METS/MODS Approach
» Ability to model complex library objects
» Ease of change and extension
• both the data and the application
» Use of modern, non-proprietary software tools
» Use of XSLT for…
• Legacy data conversion
• Batch METS creation and editing
• Web displays and behaviors
» Use of a common syntax – XML
• For data creation, editing, storage and searching
continued…
Advantages of METS/MODS Approach
» Creation of multiple outputs from XML
• HTML/XHTML for Web display; PDF for printing

» Ease of editing
• Single records or selected batches of records

» Ability to validate data


» Ability to aggregate disparate data sources
» Ease of data management and publishing
» Excellent positioning for the future
• New web applications (Web 2.0)
• Repository submission and OAI harvesting
• Cooperative projects (test interoperability)

You might also like