0% found this document useful (0 votes)
18 views310 pages

Ms A Powerpoint

Uploaded by

Miguel Alvarado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views310 pages

Ms A Powerpoint

Uploaded by

Miguel Alvarado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 310

Metadata Standards and

Applications
Introduction:
Background, Goals,
and Course Outline
Version 2.1, February 2009

2
Cataloging for the 21st
Century
Background for this course:
 The first of five courses developed as part
of:
 Bibliographic Control of Web Resources: A
Library of Congress Action Plan
 Action Item 5.3: Continuing Education (CE)
 Continuing Education Implementation Group
(CEIG)
 See course Bibliography for citations

3
Cataloging for the 21st Century:
The five CE course components
 1. Rules and Tools for Cataloging Internet
resources
 2. Metadata Standards and Applications
 3. Principles of Controlled Vocabulary and
Thesaurus Design
 4. Metadata and Digital Library Development
 5. Digital Project Planning and Management
Basics

4
Cataloging for the 21st Century:
CE Course Series Objectives
 To equip catalogers to deal with new types of resources and
to recognize their unique characteristics
 To equip catalogers to evaluate competing approaches to and
standards for providing access to resources
 To equip catalogers to think creatively and work
collaboratively with others inside and outside their home
institutions
 To ensure that catalogers have a broad enough understanding
of the current environment to be able to make their local
efforts compatible and interoperable with other efforts
 To prepare catalogers to be comfortable with ambiguity and
being less than perfect
 To enable practicing catalogers to put themselves into the
emerging digital information environment and to continue to
play a significant role in shaping library services
5
Goals for this Course

 Understand similarities and differences between


traditional and digital libraries
 Explore different types and functions of metadata
(administrative, descriptive, technical, etc.)
 Understand metadata standards: schemas, data content
standards, and data value standards
 Learn how various metadata standards are applied in
digital projects, including use of application profiles
 Understand how different controlled vocabularies are
used in digital libraries
 Approaches to metadata creation, storage and retrieval
 Learn about metadata interoperability and quality issues

6
Course objectives
 Increase catalogers’ understanding of metadata for
digital resources
 Evaluate competing approaches and standards for
managing and providing access to resources
 Enable catalogers to think creatively and work
collaboratively
 Increase understanding of current environment to
allow for compatibility among applications
 Increase flexibility in utilizing different kinds of
metadata standards
 Allow catalogers to use expertise to contribute to
the emerging digital information environment

7
Outline of this course
 Session 1. Introduction to Digital Libraries
and Metadata

 Session 2. Descriptive Metadata


Standards
 Data content standards, data value
standards, data structure standards
 Specific descriptive metadata formats
 Relationship models

8
Outline of this course cont.
 Session 3. Technical and Administrative
Metadata Standards

 Session 4. Metadata Syntaxes and


Containers

 Session 5. Application Profiles and how


they are used in digital libraries

9
Outline of this course cont.
 Session 6. Controlled Vocabularies

 Session 7. Metadata Creation, Storage


and Retrieval

 Session 8. Metadata Interoperability and


Quality Issues

10
1. Introduction to Digital
Libraries and Metadata

Metadata Standards and


Applications Workshop
Goals of Session

 Understand similarities and


differences between traditional and
digital libraries focusing on
metadata

 Explore different types and functions


of metadata (descriptive,
administrative, structural, etc.)

2
Traditional vs. Digital
Libraries

Traditional library characteristics

Digital library characteristics?


3
What is a digital library?

 a library in which collections are


stored in digital formats and
accessed by computers. The digital
content may be stored locally, or
accessed remotely via computer
networks.
 a type of information retrieval
system.
4
Digital Library Federation
(DLF)
 “Digital libraries are organizations that
provide the resources, including the
specialized staff, to select, structure, offer
intellectual access to, interpret, distribute,
preserve the integrity of, and ensure the
persistence over time of collections of
digital works so that they are readily and
economically available for use by a defined
community or set of communities.”
 http://www.diglib.org/

5
http://www.worlddigitallibrary.org/project/english/index.html
How does the environment
affect the creation of
metadata?

8
Traditional Libraries

 Firm commitment to standards


 Specifications for metadata content (e.g.,
AACR2)
 Specifications for metadata encoding (e.g.,
MARC)
 A variety of syntaxes can be used
 Agreements on quality expectations
 Tradition of sharing, facilitated by bibliographic
utilities
 Available documentation and training
9
Digital Libraries
 No dominant content standard
 A variety of “formats” (or “schemas” or “element
sets”)
 Some emerging “federated” agreements, mostly
in the world of digital libraries attached to
traditional libraries
 Variable quality expectations
 Emerging basis for sharing (OAI-PMH)
 Some documentation and training is becoming
available
10
Environmental Factors

 Differences:
 Players: New world of metadata not
necessarily led by librarians
 Goals: Competition for users critical for
sustainability
 Resources: No real basis for understanding
non-technical needs (including metadata
creation and maintenance)
 Many levels of content responsibility (or none)
11
Environmental Factors

 Similarities
 It’s about discovery (and access, and use and
meeting user needs)
 Pressure for fast, cheap and “good enough”
(also rich, scalable, and re-usable--is that a
contradiction?)
 Wide variety of materials and services
 Maintenance needs often overlooked

12
What IS Metadata?

 Some possibilities:
 Data about data (or data about resources)
 Structured information that describes,
explains, locates, and otherwise makes it
easier to retrieve and use an information
resource.”
 A management tool
 Computer-processible, human-interpretable
information about digital and non-digital
objects

13
“In moving from dispersed digital collections to
interoperable digital libraries, the most important
activity we need to focus on is standards… most
important is the wide variety of metadata standards
[including] descriptive metadata… administrative
metadata…, structural metadata, and terms and
conditions metadata…”
Howard Besser, NYU

14
Metadata standards in digital
libraries
 Interoperability and object exchange requires the use of
established standards
 Many digital objects are complex and are comprised of
multiple files
 XML is the de-facto standard syntax for metadata descriptions
on the Internet
 Complex digital objects require many more forms of metadata
than analog for their management and use
 Descriptive
 Administrative
 Technical
 Digital provenance/events
 Rights/Terms and conditions
 Structural

15
Functions of Metadata

Discover Manage Control IP


resources documents Rights

Identify Certify Indicate


versions authenticity status

Mark content Situate Describe


structure geospatially processes

16
Types of metadata

 Descriptive
 Administrative
 Technical
 Digital provenance
 Rights/Access
 Preservation
 Structural
 Meta-metadata
 Other?

17
Cataloging and Metadata

 Cataloging early form of descriptive


metadata
 Underlying models for cataloging based on
AACR2 and MARC 21
 Some new metadata models are emerging
(e.g, DC Abstract Model and RDA in
development)
 Most metadata models roughly based on
attribute/value pairs:
 <property> = <value>

18
Some differences between
traditional and digital libraries
 Metadata only vs. actual object
 Need to understand Web technologies
 Types of media
 Granularity
 User needs
 Web services
 Digitized vs. born digital
Slide by Brian Surratt
19
One BIG Difference ...

 Catalogers most often are attempting to fit new


items into an already existing world of
materials--
 The structure already exists, as do the rules for
describing

 Metadata practitioners are generally working


with aggregated “stuff,” attempting to find a
way to make it accessible
 Involves broad understanding, ability to work
with others to make decisions that work for whole
projects or domains
20
*Thanks to Marty Kurth for these insights
Questions to ask when selecting
metadata standards
 What type of material will be digitized?
 How rich does the metadata need to be?
 Is there information already available?
 Is there a Community of practice developed for this
resource type(s)?
 What is the purpose of digital project?
 Who will be the audience and how will they use the content?
 Are there pre-existing digital projects with which this one
needs to function? Is there a need to interact with any
existing records?
 What tools or systems options are available?

21
Exercise

 Examine the digital library sites below, and be


prepared to discuss differences in user
approach and experience. Look for how
metadata is used.
 Alsos: Digital Library for Nuclear Issues (http://
alsos.wlu.edu/default.aspx)
 CSUN Oviatt Library: Digital Collections (http://
library.csun.edu/Collections/SCA/digicoll.html)
 Birdsource (http://www.birdsource.org/)

22
2. Specific metadata
standards: descriptive
Metadata Standards and
Applications Workshop
Session 2 Objectives

 Understand the categories of descriptive


metadata standards (e.g., data content
standards vs. data value standards)
 Learn about the various descriptive
metadata standards and the community
that developed and use them
 Learn about some relationship models
used in descriptive metadata standards

2
Outline of Session 2:
descriptive metadata
 Types of descriptive metadata standards
(e.g. element sets, content standards)

 Specific descriptive metadata standards


(e.g. MARC, DC, MODS, EAD…)

 Relationship models

3
Descriptive metadata

 Most standardized and well understood


type of metadata
 Major focus of library catalog
 Increased number of descriptive
metadata standards for different needs
and communities
 Importance for resource discovery
 May support various user tasks

4
Aspects of descriptive
metadata
 Data content standards (e.g., rules: AACR2R/RDA,
CCO)
 Data value standards (e.g., values/controlled
vocabularies: LCNAF, LCSH, MeSH, AAT)
 Data structure standards (e.g., formats/schemes:
DC, MODS, MARC 21)
 Set of semantic properties, in this context used to
describe resource
 Data exchange/syntax standards (e.g. MARC 21
(ISO 2709), MARCXML, DC/RDF or DC/XML)
 The structural wrapping around the semantics
 Relationship models

5
Content Standards: Rules

 AACR2 functions as the content standard for traditional


cataloging
 RDA (Resource Description and Access) is the
successor to AACR2 that aspires to be independent of a
particular syntax
 DACS (Describing Archives: a Content Standard)
 CCO (Cataloging Cultural Objects) new standard
developed by visual arts and cultural heritage
community
 CSDGM (Content Standards for Digital Geospatial
Metadata)
 Best practices, Guidelines, policies-- less formal content
standards

6
Content Standards: Value
Standards/Controlled
Vocabularies
 Examples of thesauri
 Library of Congress Subject Headings
 Art and Architecture Thesaurus
 Thesaurus of Geographical Names

 Examples of value lists


 ISO 639-2 Language codes
 MARC Geographic Area codes
 Other enumerated lists (e.g. MARC/008 lists)
 Dublin Core Resource Types

7
Data structure standards
(element sets and formats)
 Facilitates database creation and record retrieval
 Flexibility because not tied to a particular syntax
 May provide a minimum of agreed upon
elements that facilitate record sharing and
minimal consistency
 Different user communities develop their own
standard data element sets
 May differ in complexity and granularity of fields
 Some data element sets become
formats/schemes by adding rules such as
repeatability, controlled vocabularies used, etc.

8
Data Structure Standards:
Examples

 MARC 21 (http://www.loc.gov/marc/)
/marc/

 Dublin Core (http://dublincore.org)


dublincore.org

 MODS (www.loc.gov/standards/mods/)
 IEEE-LOM (http://ltsc.ieee.org/wg12/)
http://ltsc.ieee.org/wg12/

 ONIX (http://www.editeur.org/onix.html)
 EAD (http://www.loc.gov/ead/)
9
Data Structure Standards:
Examples, cont.

 VRA Core (
http://www.vraweb.org/projects/vracore4/)

 PBCore (http://www.pbcore.org/)

 TEI (http://www.tei-c.org/index.xml)

10
What is MARC 21?

 A syntax defined by an international standard and


was developed in the late 60s
 As a syntax it has 2 expressions:
 Classic MARC (MARC 2709)
 MARCXML
 A data element set defined by content designation
and semantics
 Institutions do not store “MARC 21”, as it is a
communications format
 Many data elements are defined by external
content rules; a common misperception is that it is
tied to AACR2

11
02158cam 2200349Ia
450000100130000000300060001300500170001900600190003600600190005500700150007400800
41000890400020001300200015001500430021001650490009001862450119001952460025
003142600065003395380030004045060038004345360153004725200764006255050094013895000
086014836000049015696500040016186510039016586510023016977000026017208560050017469
94001201796ocm56835268 OCoLC20060118051017.0m d szx w s 0 2cr mn---------
041028m20049999vau st 000 0 eng d aVA@cVA@dOCLCQ a0813922917 an-us---an-us-va
aVA@@04aThe Dolley Madison digital editionh[electronic resource] :bletters 1788-June 1836 /
cedited by Holly C. Shulman. 1iAlso known as:aDMDE aCharlottesville, Va. :bUniversity of Virginia
Press,c2004- aMode of access: Internet. aSubscription required for access. aRotunda editions are
made possible by generous grants from the Andrew W. Mellon Foundation and the President's Office
of the University of Virginia. aDolley Payne Madison was the most important First Lady of the
nineteenth century. The DMDE will be the first-ever complete edition of all of her known
correspondence, gathered in an XML-based archive. It will ultimately include close to 2,500 letters.
From the scattered correspondence were gathered letters that have never been previously published.
The range and scope of the collection makes this edition an important scholarly contribution to the
literature of the early republic, women's history, and the institution of the First Lady. These letters
present Dolley Madison's trials and triumphs and make it possible to gain admittance to her mind and
her private emotions and to understand the importance of her role as the national capital's First
Lady.0 aGeneral introduction -- Biographical introduction -- Introduction to the digital edition. aTitle
from the opening screen; description based on the display of Oct. 21, 2004.10aMadison,
Dolley,d1768-1849vCorrespondence. 0aPresidents' spouseszUnited States. 0aUnited
StatesxHistoryy1801-1809. 0aVirginiaxHistory.1 aShulman, Holly Cowan.40
uhttp://rotunda.upress.virginia.edu:8100/dmde/ aC0bVA@

12
MARC 21 Scope
 Bibliographic Data
 books, serials, computer files, maps, music, visual
materials, mixed material
 Authority Data
 names, titles, name/title combinations, subjects, series
 Holdings Data
 physical holdings, digital access, location
 Classification Data
 classification numbers, associated captions, hierarchies
 Community Information
 events, programs, services, people, organizations

13
MARC 21 implementation
 National formats were once common and there
were different flavors of MARC
 Now most have harmonized with MARC 21 (e.g.
CANMARC, UKMARC, MAB)
 Billions of records world wide
 Integrated library systems that support MARC
bibliographic, authority and holdings format
 Wide sharing of records for 30+ years
 OCLC is a major source of MARC records

14
Streamlining MARC 21
into the future
 Take advantage of XML
 Establish standard MARC 21 in an XML structure
 Take advantage of freely available XML tools
 Develop simpler (but compatible) alternatives
 MODS
 Allow for interoperability with different XML
metadata schemas
 Assemble coordinated set of tools
 Provide continuity with current data
 Provide flexible transition options

15
MARC 21 evolution to XML

16
MARC 21 in XML – MARCXML

 MARCXML record
 XML exact equivalent of MARC (2709) record
 Lossless/roundtrip conversion to/from MARC
21 record
 Simple flexible XML schema, no need to
change when MARC 21 changes
 Presentations using XML stylesheets
 LC provides converters (open source)
 http://www.loc.gov/standards/marcxml

17
Example: MARC and
MARCXML

 Music record in MARC


 Music record in MARCXML

18
What is MODS?

 Metadata Object Description Schema


 An XML descriptive metadata standard
 A derivative of MARC
 Uses language based tags
 Contains a subset of MARC data elements
 Repackages elements to eliminate redundancies
 MODS does not assume the use of any specific
rules for description
 Element set is particularly applicable to digital
resources
19
MODS high-level elements
 Title Info  Note
 Name  Subject
 Type of resource  Classification
 Genre  Related item
 Origin Info  Identifier
 Language  Location
 Physical description  Access conditions
 Abstract  Part
 Table of contents  Extension
 Target audience  Record Info

20
Advantages of MODS

 Element set is compatible with existing


descriptions in large library databases
 Element set is richer than Dublin Core but
simpler than full MARC
 Language tags are more user-friendly than
MARC numeric tags
 Hierarchy allows for rich description, especially
of complex digital objects
 Rich description that works well with hierarchical
METS objects

21
Uses of MODS
 Extension schema to METS
 Rich description works well with hierarchical METS
objects
 To represent metadata for harvesting (OAI)
 Language based tags are more user friendly
 As a specified XML format for SRU
 As a core element set for convergence
between MARC and non-MARC XML
descriptions
 For original resource description in XML syntax
that is simpler than full MARC

22
Example: MODS

 Music record in MODS

23
Status of MODS

 Open listserv collaboration of possible


implementers, LC coordinated (1st half 2002)
 First comment and use period: 2nd half 2002
 Now in MODS version 3.3
 Companion for authority metadata (MADS) in
version 1.0
 Endorsed as METS extension schema for descMD
 Many expose records as MODS in OAI
 MODS Editorial Committee being formed

24
A selection of MODS projects
 LC uses of MODS
 LC web archives
 Digital library METS projects
 University of Chicago Library
 Chopin early editions
 Finding aid discovery
 Digital Library Federation Aquifer initiative
 National Library of Australia
 MusicAustralia: MODS as exchange format between National
Library of Australia and ScreenSoundAustralia
 Australian national bibliographic database metadata project
 See: MODS Implementation registry
http://www.loc.gov/mods/registry.php

25
What is MADS?

 Metadata Authority Description Schema


 A companion to MODS for authority data using
XML
 Defines a subset of MARC authority elements
using language-based tags
 Elements have same definitions as equivalent
MODS
 Metadata about people, organizations, events,
subjects, time periods, genres, geographics,
occupations

26
MADS elements
 authority  note
 name  affiliation
 titleInfo
 url
 topic
 temporal
 identifier
 genre  fieldOfActivity
 geographic  extension
 hierarchicalGeographic
 recordInfo
 occupation
 related
 same subelements
 variant
 same subelements

27
Uses of MADS
 As an XML format for information about people,
organizations, titles, events, places, concepts
 To expose library metadata in authority files
 To allow for linking to an authoritative form
and fuller description of the entity from a
MODS record
 For a simpler authority record than full MARC 21
authorities
 To integrate bibliographic/authority information
for presentation

28
Example: MADS Name Record
<mads xsi:schemaLocation="http://www.loc.gov/mads/ http://www.loc.gov/mads/mads.xsd">
<authority>
<name type=“personal”>
<namePart>Smith,John</namePart>
<namePart type="date">1995-</namePart>
</name>
</authority>
<variant type="other">
<name>
<namePart>Smith, J</namePart>
</name>
</variant>
<variant type="other">
<name>
<namePart>Smith, John J</namePart>
</name>
</variant>
<note type="history">Biographical note about John Smith.</note>
<affiliation>
<organization>Lawrence Livermore Laboratory</organization>
<dateValid>1987</dateValid>
</affiliation>
</mads>

29
Example: MADS Organization Record

<mads xsi:schemaLocation="http://www.loc.gov/mads/
http://www.loc.gov/mads/mads.xsd">
<authority>
<name type=“corporate”>
<namePart>Unesco</namePart>
</name>
</authority>
<related type="parentOrg">
<name>
<namePart>United Nations</namePart>
</name>
</related>
<variant type="expansion">
<name>
<namePart>United Nations Educational, Cultural, and Scientific
Organization</namePart>
</name>
</variant>
</mads>

30
Some MADS implementations

 Irish Virtual Research Library and Ar


chive Repository Prototype
 Perseus Digital Library (Tufts)
 Mark Twain Papers (University of
California)
 Library of Congress/National Library
of Egypt

31
Dublin Core: Simple

 Simple to use
 All elements are optional/repeatable
 No order of elements prescribed
 Interdisciplinary/International
 Promotes semantic interoperability
 Controlled vocabulary values may be expressed,
but not the sources of the values

32
Dublin Core Elements

Fifteen elements in Simple DC

Title Creator Date


Description Contributor Coverage

Subject Publisher Identifier


Relation Rights Format

Source Coverage Type


33
“Qualified” Dublin Core

 Includes 15 terms of the original DC Metadata


Element Set, plus:
 Additional properties and sub-properties
 Examples: abstract, accessRights, audience,
instructionalMethod, rightsHolder, provenance
 Provides:
 A fuller set of properties with specific requirements for
content
 A namespace that includes all properties
 Explicit value vocabularies can be specified

34
DC Structure

 Property/element refinements are used at the


element level in DC/XML
 Relationships between properties and sub-properties
explicit in the formal representation
 Does not use XML “nesting” to express those
relationships
 Encoding schemes (Syntax & Vocabulary)
 Syntax ES: Essentially a datatype that communicates
the format or structure of a string
 Vocabulary ES: Includes values from an identified
controlled vocabulary or list

35
Advantages: Dublin Core

 International and cross-domain


 Developed via an open review process
 Increased efficiency of the discovery/retrieval of
digital objects
 Rich element set (qualified DC) provides a
framework of elements which will aid the
management of information
 Ease of mapping to other metadata standards
promotes collaboration of cultural/educational
information

36
Uses of Dublin Core

Minimal standard for OAI-PMH


Core element set in some other
schemas
Switching vocabulary for more
complex schemas

37
Ex.: Simple Dublin Core

<metadata>
<dc:title>3 Viennese arias: for soprano, obbligato clarinet in B flat, and
piano.</dc:title>
<dc:contributor>Lawson, Colin (Colin James)</dc:contributor>
<dc:contributor>Bononcini, Giovanni, 1670-1747.</dc:contributor>
<dc:contributor>Joseph I, Holy Roman Emperor, 1678-1711.</dc:contributor>
<dc:subject>Operas--Excerpts, Arranged--Scores and parts</dc:subject>
<dc:subject>Songs (High voice) with instrumental ensemble--Scores and
parts</dc:subject>
<dc:subject>M1506 .A14 1984</dc:subject>
<dc:subject></dc:subject>
<dc:subject></dc:subject>
<dc:date>1984</dc:date>
<dc:format>1 score (12 p.) + 2 parts ; 31 cm.</dc:format>
<dc:type>text</dc:type>
<dc:identifier>85753651</dc:identifier>
<dc:language>it</dc:language>
<dc:language>en</dc:language>
<dc:publisher>Nova Music</dc:publisher></metadata>
38
Ex.: Qualified Dublin Core

<metadata>
<dc:title xml:lang="en">3 Viennese arias: for soprano, obbligato clarinet in B flat,
and piano.</dc:title>
<dc:contributor>Lawson, Colin (Colin James)</dc:contributor>
<dc:contributor>Bononcini, Giovanni, 1670-1747.</dc:contributor>
<dc:contributor>Joseph I, Holy Roman Emperor, 1678-1711.</dc:contributor>
<dc:subject xsitype="LCSH">Operas--Excerpts, Arranged--Scores and
parts</dc:subject>
<dc:subject xsitype="LCSH">Songs (High voice) with instrumental ensemble--
Scores and parts</dc:subject>
<dc:subject xsitype="LCC">M1506 .A14 1984</dc:subject>
<dc:date xsitype="W3CDTF">1984</dc:date>
<dcterms:extent>1 score (12 p.) + 2 parts ; 31 cm.</dcterms:extent>
<dc:type xsitype="DCMIType">Sound</dc:type>
<dc:identifier>85753651</dc:identifier>
<dc:language xsitype="RFC3066">it</dc:language>
<dc:language xsitype="RFC3066">en</dc:language>
<dc:publisher>Nova Music</dc:publisher>
</metadata>
39
Status of DC

 Dublin Core Metadata Element Set version 1.1


 ISO Standard 15836-2003; ANSI/NISO Standard
Z39.85-2007; IETF RFC 5013
 Updated encoding guidelines
 Proposed recommendation for expressing DC
description sets using XML (Sept. 2008)
 Final recommendation for expressing DC metadata
using HTML/XHTML (Aug. 2008)

40
A selection of DC projects
 National Science Digital Library http://nsdl.org/
Aggregates a wide variety of source collections using
Dublin Core
 Kentuckiana Digital Library http://kdl.kyvl.org/
 For item level metadata, on DLXS software

 Gathering the Jewels http://www.gtj.org.uk/


 Website for Welsh cultural history using DC standards

 MusicBrainz http://musicbrainz.org/
 User-maintained community music recording
database; extension of DC

41
Encoded Archival
Description (EAD)
 Standard for electronic encoding of finding
aids for archival and manuscript collections
 Expressed as an SGML/XML DTD
 Supports archival descriptive practices and
standards
 Supports discovery, exchange and use of
data
 Developed and maintained by Society of
American Archivists; LC hosts the website

42
EAD, continued

 Based on the needs of the archival


community
 Good at describing blocks of information,
poor at providing granular information
 Some uptake by museum community
 Not a content standard
 EAC is a companion for information about
creators of archival material
 Example:
http://purl.dlib.indiana.edu/iudl/findingaids/
lilly/InU-Li-VAA1292
43
Benefits of an EAD finding aid

 Documents the interrelated descriptive


information of an archival finding aid
 Preserves the hierarchical relationships
existing between levels of description
 Represents descriptive information that is
inherited by one hierarchical level from
another
 Supports element-specific indexing and
retrieval of descriptive information

44
Text Encoding Initiative (TEI)
 Consortium of institutions and research
projects which collectively maintains and
develops guidelines for the representation of
texts in digital form.
 Includes representation of title pages,
chapter breaks, tables of contents, as well
as poetry, plays, charts, etc.
 The TEI file contains a “header” that holds
metadata about the digital file & about the
original source.
45
TEI
<fileDesc>
<titleStmt>
<title type="main">A chronicle of the conquest of
Granada</title>
<author>
<name type="last">Irving</name>
<name type="first">Washington</name>
<dateRange from="1783"
to="1859">1783-1859</dateRange>
</author>
</titleStmt>
<extent>455 kilobytes</extent>
<publicationStmt>
<publisher>University of Virginia Library</publisher>
<pubPlace>Charlottesville, Virginia</pubPlace>
<date value="2006">2006</date>
</publicationStmt>
<availability status="public">
<p n="copyright">Copyright &copy; 2006 by the Rector and
Visitors of the University of Virginia</p>
<p n="access">Publicly accessible</p>
</availability> 46
MORE TEI
<sourceDesc>
<titleStmt>
<title type="main”>A chronicle of the conquest of
Granada</title>
<author>
<name type="last">Irving</name>
<name type="first">Washington</name>
<dateRange from="1783"
to="1859">1783-1859</dateRange>
</author>
</titleStmt>
<extent>345 p. ; 21 cm.</extent>
<publicationStmt>
<publisher>Carey, Lea &amp; Carey</publisher>
<pubPlace>Philadelphia</pubPlace>
<date value="1829">1829</date>
<idno type="LC call number">DP122 .I7 1829a</idno>
<idno type="UVa Title Control Number">a1599744</idno>
</publicationStmt>
</sourceDesc> 47
Selection of TEI projects

 American Memory (uses a TEI-conformant DTD


 http://memory.loc.gov/ammem/index.html
 Early Canada Online
 http://www.canadiana.org/
 Victorian Women Writers Project
 http://www.indiana.edu/~letrs/vwwp/index.html
 Oxford Text Archive
 http://ota.ahds.ac.uk/

48
VRA Core
 Maintained by the Visual Resources Association.
Version 4 is currently in beta release.

 A categorical organization for the description of works


of visual culture
as well as the images
that document them.

 Consists of a metadata
element set and an
initial blueprint for
how those elements
can be hierarchically
structured.

credit: K. Edward Lay

49
Work, Collection or Image

 work, collection or  relation


image  rights
 agent  source
 culturalContext  stateEdition
 date  stylePeriod
 description  subject
 inscription  technique
 location  textRef
 Material  title
 Measurements  workType

50
Advantages of VRA

 Allows description of original and


digital object
 Level of granularity greater than
Dublin Core, less than MARC and
supports specific discipline
 Now content rules have been
developed (CCO)

51
VRA Core
<work>
<titleSet>
<title pref="true” source=“LC NAF”>Rotunda</title>
</titleSet>
<agentSet><agent>
<name type="personal“ vocab=“LC NAF” refid= “n 79089957”>
Jefferson, Thomas</name>
<dates type="life">
<earliestDate>1743</earliestDate><latestDate>1826</latestDate>
</dates>
<role>architect</role>
<culture>American</culture>
</agent></agentSet>
<agentSet><agent>
<name type="personal“ vocab=“LC NAF” refid= “n 50020242”>
White, Stanford</name>
<dates type="life">
<earliestDate>1853</earliestDate><latestDate>1906</latestDate>
</dates>
<role>architect</role>
<culture>American</culture>
<notes>Architect of 1896-1897 renovation</notes>
</agent></agentSet>
52
<dateSet>
More VRA Core
<date type="construction">
<earliestDate>1822</earliestDate><latestDate>1826</latestDate>
</date>
<notes>Construction begun October, 1822, completed September, 1826.<notes>
</dateSet>
<dateSet>
<date type=“destruction">
<earliestDate>1895</earliestDate>
</date>
<notes>Burned October 27, 1895.</notes>
</dateSet>
<dateSet>
<date type=“renovation">
<earliestDate>1896</earliestDate><latestDate>1897</latestDate>
</date>
<notes>Rebuilt to designs of Stanford White, 1896-1897.</notes>
</dateSet>
<locationSet><location type="site">
<name type="geographic" vocab="TGN" refid="2002201">
Charlottesville, Virginia</name>
</location></locationSet>
</work>
53
More VRA Core
<image>
<titleSet>
<title type="descriptive">general view</title>
</titleSet>
<agentSet><agent>
<name type="personal“ vocab=“LC NAF”
refid=“n 82111472”>Lay, K. Edward</name>
<culture>American</culture>
<role>photographer</role>
</agent></agentSet>
<dateSet><date type=“creation">
<earliestDate>1990</earliestDate> credit: K. Edward Lay
<latestDate>2000</latestDate>
</date></dateSet>
<locationSet><location type="repository">
<name type="corporate">University of Virginia Library</name>
<name type="geographic" vocab="TGN" refid="2002201">
Charlottesville</name>
</location></locationSet>
<rightsSet>
<rights type=“credit”>K. Edward Lay</rights>
<rights type=“access”>Publicly accessible</rights>
</rightsSet>
</image> 54
More VRA Core
<image>
<titleSet>
<title type="descriptive">View from gymnasia</title>
</titleSet>
<agentSet><agent>
<name type="personal“ vocab=“LC NAF”
refid=“n 82111472”>Lay, K. Edward</name>
<culture>American</culture>
<role>photographer</role>
</agent></agentSet>
<dateSet><date type=“creation">
<earliestDate>1995</earliestDate>
<latestDate>2000</latestDate>
</date></dateSet> credit: K. Edward Lay

<locationSet><location type="repository">
<name type="corporate">University of Virginia Library</name>
<name type="geographic" vocab="TGN" refid="2002201">
Charlottesville</name>
</location></locationSet>
<rightsSet>
<rights type=“credit”>K. Edward Lay</rights>
<rights type=“access”>Publicly accessible</rights>
</rightsSet>
</image> 55
A Selection of VRA Core
Projects
 Luna Imaging
 http://www.lunaimaging.com/index.html
 ARTstor
 http://www.artstor.org/
 Visual Information Access (VIA), Harvard
University Libraries
 http://via.lib.harvard.edu/via/

56
Learning Object Metadata (LOM)

 An array of related standards for description of


‘learning objects’ or ‘learning resources’
 Most based on efforts of the IEEE LTSC
(Institute of Electrical and Electronics Engineers
Learning Technology Standards Committee) and
the IMS Global Learning Consortium, inc.
 Tends to be very complex with few
implementations outside of government and
industry
 One well-documented implementation is
CanCore
57
Uses of IEEE-LOM

 Describe and share information about


learning objects individually or as a group
 Export as LOM in XML or RDF
 Most descriptive elements mapped to
Dublin Core
 Can be used with the IMS VDEX
(Vocabulary Definition Exchange)

58
A Selection of IEEE-LOM
Projects
 CanCore
 http://www.cancore.ca/
 LearnAlberta.ca
 http://www.learnalberta.ca/
 Grades K-12
 Learning Object Repository Network
 http://lorn.flexiblelearning.net.au/Home.aspx

59
What is ONIX for Books?

 Originally devised to simplify the provision of


book product information to online retailers
(name stood for ONline Information eXchange)
 First version flat XML, second version included
hierarchy and elements repeated within
‘composites’
 Maintained by Editeur, with the the Book
Industry Study Group (New York) and Book
Industry Communication (London)
 Includes marketing and shipping oriented
information: book jacket blurb and photos, full
size and weight info, etc.

60
Advantages of ONIX

 Provides publisher information in a widely used


standard format
 Promotes exchange of information with
publishers, vendors, book sellers, libraries
 “Value-added” information (ex., book jacket
images, reviews) benefits book sellers (online
commercial sites) and libraries (online catalogs)
 More [information], faster [transmission],
cheaper? better?

61
A selection of ONIX projects

 http://www.editeur.org/onix.html
 ONIX Administrators
 EDItEUR (European & international)
 Book Industry Communication (BIC) (European and
international)
 Book Industry Study Group, Inc. (BISG) (U.S.)
 Amazon.com
 Association of American Publishers
 Baker & Taylor
 Barnes & Noble
 Google
 McGraw-Hill Companies

62
PBCore

 Public Broadcasting Core element


set
 http://www.pbcore.org/
 Built on Dublin Core (but does not
comply with the Abstract Model)
 Provides a shared descriptive
language for public broadcasters
 Used for television, radio, Web
activities
63
PBCore Elements

 53 elements arranged in 15 containers


and 3 sub-containers
 Four classes:
 Intellectual Content (title, subject, description,
audienceLevel …)
 Intellectual Property (creator, contributor, publisher,
rightsSummary)
 Instantiation (dateCreated, formatFileSize,
formatDuration, formatTracks, language)
 Extensions

64
Uses of PBCore

 Shared descriptive language for public


broadcasters
 Useful for both public search and viewing,
and internal asset management
 Facilitates production collaborations
 Ability to parse programs into short
segments for Web distribution, niche
community needs

65
Selection of PBCore
projects
 Wisconsin Public Television (WPT)
Media Library Online
http://wptmedialibrary.wpt.org/
 Kentucky Educational Television
(KET) http://www.ket.org/
 New Jersey Network (NJN)
http://www.njn.net/

66
Modeling metadata: why
use models?
 To understand what entities you are
dealing with
 To understand what metadata are
relevant to which entities
 To understand relationships between
different entities
 To organize your metadata to make
it more predictable (and be able to
use automated tools)
67
Descriptive metadata
models
 Conceptual models for bibliographic and authority
data
 Functional Requirements for Bibliographic Records
(FRBR)
 Functional Requirement for Authority Data (FRAD)
 Dublin Core Abstract Model (DCAM)
 Some other models:
 CIDOC Conceptual Reference Model (emerged from museum
community)
 INDECS (for intellectual property rights)
 There are many conceptual models intended for
different purposes

68
Bibliographic relationships
(pre-FRBR)
 Tillett’s Taxonomy (1987)
 Equivalence

 Derivative

 Descriptive

 Whole-part

 Accompanying

 Sequential

 Shared-characteristic

69
Bibliographic relationships in
MARC/MODS
 MARC Linking entry fields
 MARC relationships by specific
encoding format
 Authority vs bibliographic vs holdings
 MODS relationships
 relatedItem types
 Relationship to METS document

70
FRBR (1996)

 IFLA Study Group on the Functional


Requirements for Bibliographic Records
 Focused on the bibliographic record rather
than the catalog
 Used an entity relationship model, rather
than descriptive analysis without a
structural model
 Broader in scope than previous studies

71
FRBR Entities

 Bibliographic entities: works,


expressions, manifestations, items
 Responsible parties: persons,
corporate bodies
 Subject entities: concepts, objects,
events, places

72
Group 1 Entities and Relationships

An Expression Work A Work


“realizes”
A Work “Is realized through”
An Expression

Expression An Expression
A Manifestation
“embodies” “Is embodied in”
An Expression A Manifestation

Manifestation
An Item
“exemplifies” A Manifestation
A Manifestation “Is exemplified by”
An Item
Item

[Thanks to Sherry Vellucci for this slide.] 73


DC Abstract Model
 Reaffirms the One-to-One Principle
 Defines ‘statement’ as the atomic level
 Distinguishes between “description” and
“description set”:
 Description: “A description is made up of one or
more statements about one, and only one,
resource.”
 Description Set: “A description set is a set of one
or more descriptions about one or more resources.”
 RDA vocabularies being developed to use the
DC Abstract Model

74
is instantiated record
as

is grouped
into
description A record consists of descriptions,
set
using properties and values.
description
description
description A value can be a string or a pointer
to another description.
has one statement
or more statement
statement
has one property

has one value string


value is a
OR
is represented
by one or more representation
representation rich value
representation
is a OR

is a related
description
75
Basic model: Resource with properties

A Play has the title “Antony and Cleopatra,” was written in 1606
by William Shakespeare, and is about “Roman history”
76
… related to other Resources

77
An Exercise

Each group will be given a printout of


a digital object
Create a brief metadata record based
on the standard assigned to your
group
Take notes about the issues and
decisions made
Appoint a spokesperson to present
the metadata record created & the
issues involved (5-10 minutes)
78
3. Technical and
administrative metadata
standards
Metadata Standards and
Applications Workshop
Goals of session

 To understand the different types of


administrative metadata standards
 To learn what types of metadata are
needed for digital preservation
 To learn the importance of technical,
structural and rights metadata in
digital libraries

2
Types of administrative
metadata
 Provides information to help manage a resource
 Preservation metadata

 Technical characteristics
 Information about actions on an object
 Structural metadata may be considered
administrative; indicates how compound
objects are put together
 Rights metadata

 Access rights and restrictions


 Preservation rights and restrictions

3
PREMIS: introduction

 Preservation metadata that includes


subcategories:
 Technical metadata
 Relationships (structural and
derivative)
 Digital provenance (what actions
performed on objects)
 Rights

4
Preservation metadata
includes: Preservation
Metadata
 Provenance:
 Who has had custody/ownership of the
Content
digital object?

 Authenticity:
10 years on
 Is the digital object what it purports to be?

 Preservation Activity: 50 years on

 What has been done to preserve it?

Forever!
 Technical Environment:
 What is needed to render and use it?

 Rights Management:
 What IPR must be observed?

 Makes digital objects self-documenting across time


5
PREMIS Data Dictionary

 May 2005:
Data Dictionary for Preservation Metadata:
Final Report of the PREMIS Working Group
 237-page report includes:
 PREMIS Data Dictionary 1.0
 Accompanying report
 Special topics, glossary, usage examples
 Data Dictionary: comprehensive, practical resource for
implementing preservation metadata in digital archiving systems
 Used Framework as starting point
 Detailed description of metadata elements
 Guidelines to support implementation, use, management
 Based on deep pool of institutional experiences in setting up and
managing operational capacity for digital preservation
 Set of XML schema developed to support use of Data Dictionary
6
Scope of data dictionary

 Implementation independent
 Descriptive metadata out of scope
 Technical metadata applying to all or
most format types
 Media or hardware details are limited
 Business rules are essential for working
repositories, but not covered
 Rights information for preservation
actions, not access

7
What PREMIS is and is not
 What PREMIS is:
 Common data model for organizing/thinking about
preservation metadata
 Guidance for local implementations
 Standard for exchanging information packages between
repositories
 What PREMIS is not:
 Out-of-the-box solution: need to instantiate as metadata
elements in repository system
 All needed metadata: excludes business rules, format-
specific technical metadata, descriptive metadata for
access, non-core preservation metadata
 Lifecycle management of objects outside repository
 Rights management: limited to permissions regarding
actions taken within repository 8
PREMIS Data Model

Intellectual
Entities
Rights
Statements

Objects Agents

Events

9
Types of information covered in
PREMIS (by entity type)
 Object  Event
 Event ID
 Object ID  Event type
 Preservation level
 Event date/time
 Object characteristics  Event outcomes
(format, size, etc.)  Linking identifiers
 Storage  Agent
 Environment
 Agent ID
 Agent name
 Digital signatures
 Rights
 Relationships  Rights statement
 Linking identifiers  Granting agent
 Permission granted

10
PREMIS Maintenance Activity

Permanent Web presence,


hosted by Library of Congress

Centralized destination for


information, announcements,
and other PREMIS-related
resources

Discussion list for PREMIS


implementers (PIG list)

Coordinate future revisions of Data Dictionary and XML schema

Editorial committee guides development and revisions

http://www.loc.gov/standards/premis/ 11
Current activities
 PREMIS Implementers’ Registry
 http://www.loc.gov/standards/premis/premis-registry.htm
l
 Revision of data dictionary and schemas (March
2008)
 Guidelines for use of PREMIS within METS have
been developed
 PREMIS tutorials
 One or one and a half day tutorials have been
given in several locations: Glasgow, Boston,
Stockholm, Albuquerque, Washington, San Diego, Berlin
 Training materials available from LC
12
Why is PREMIS important to
catalogers?
 As we take responsibility for more digital
materials, we need to ensure that they can be
used in the future
 Most preservation metadata will be generated
from the object, but catalogers may need to
verify its accuracy
 Catalogers may need to play a role in assessing
and organizing digital materials
 Understanding the structure of complex digital objects
 Determining significant properties that need to be
preserved

13
Technical metadata for
images
 NISO Z39.87 and MIX
 Adobe and XMP
 Exif
 IPTC (International Press
Telecommunications Council)/XMP
 Some of these deal with embedded
metadata in images

14
Metadata For Images in
XML (MIX)
 An XML Schema designed for expressing
technical metadata for digital still images
 Based on the NISO Z39.87 Data
Dictionary – Technical Metadata for
Digital Still Images
 Can be used standalone or as an
extension schema with METS/PREMIS

15
Using MIX
 Includes
 Characteristics that apply to all or most object types,
e.g. size, format (elements also in PREMIS)
 Format specific metadata for images
 Some examples of format specific metadata
elements in MIX:
 Image width
 Color space, color profile
 Scanner metadata
 Digital camera settings
 Most well developed of format specific technical
metadata standards

16
Technical metadata for
textual objects
 textMD is an XML Schema designed for
expressing technical metadata for textual
objects
 Developed at New York University;
maintenance transferred to LC
 Includes format specific technical
metadata for text, e.g.
 byte order
 character set encoding
 font script

17
Technical metadata for
audio and video
 Not as well developed as other technical metadata
 Complexities of file formats requires expertise to
develop these
 LC developed XML technical metadata schemas in
2003/2004 for LC Audiovisual Prototype Project
used with METS; these were widely implemented
because of the lack of other schemas
 Audio and video technical metadata schemas under
development by expert organizations
 Moving Image Catalog (MIC) project is also
experimenting with these

18
Technical metadata for
multimedia (MPEG-7)
 A multimedia content description standard,
associated with the content itself
 Intended to allow fast and efficient searching
 Formally called Multimedia Content
Description Interface
 Does not deal with the actual encoding of
moving pictures and audio (as MPEG-1, MPEG-2
and MPEG-4 do)
 intended to provide complementary functionality
to the previous MPEG standards

19
Structural metadata

 Supports the intended presentation and use and


navigation of an object
 Binds the parts together; expresses relationships
between parts of a multipart object
 Examples of structural metadata expressions:
 METS structMap
 PREMIS relationship elements
 EAD hierarchical structure

20
Rights metadata

 Rights schemas with limited scope


 Rights Expression Languages (REL) for
managing intellectual property rights,
particularly by rights owners
 Rights information is not well understood
 Different laws in different jurisdictions
 Machine actionable vs. human understandable
 Rights take different forms
 legal statutes, e.g. copyright
 contractual rights, e.g. licenses

21
Rights schemas with limited
scope
 METS Rights
 Access rights for use with METS objects
 Rights declarations
 Rights holder
 Context
 CDL copyright schema
 Specifically copyrights, not other intellectual property
rights
 Information you need to know to assess copyright
status (e.g. creators, rights holders, dates, jurisdiction)
 Note that a new field 542 has been added to MARC 21
with information about copyright to help the cataloger
assess the status of the item (based on the CDL work)

22
Rights schemas with limited
scope cont.
 PREMIS Rights
 Focused on rights for preservation rather than access
 Revision of PREMIS data dictionary expanded this area
 Allows for extensibility, i.e. inserting another rights
schema
 Creative commons
 Allows creators to choose a license for their work
 Simple rights statements that fit a lot of situations
 http://creativecommons.org/
 An example: MIC catalog

23
Rights metadata for specific
object types
 PLUS for images
 MPEG-21 REL for moving images,
etc.
 ONIX for licensing terms
 Full Rights Expression Languages
 XRML/MPEG 21
 ODRL

24
Exercise

 Provide administrative/technical
metadata for the object used in the
descriptive metadata exercise

25
4. Metadata syntaxes and
containers
Metadata Standards and
Applications Workshop
Goals of session

 Understand syntaxes used for


encoding information, including
HTML, XML and RDF
 Discover how container formats are
used for managing digital resources
and their metadata

2
Overview of Syntaxes

 HTML, XHTML: Hypertext Markup


Language; eXtensible Hypertext
Markup Language
 XML: Extensible Markup Language
 RDF/XML: Resource Description
Framework

3
HTML

 HyperText Markup Language


 HTML 4 is the current standard
 HTML is an SGML (Standard Generalized Markup
Language) application conforming to
International Standard ISO 8879
 Widely regarded as the standard publishing
language of the World Wide Web
 HTML addressed the problem of SGML
complexity by specifying a small set of structural
and semantic tags suitable for authoring
relatively simple documents
4
XHTML

 XML-ized version of HTML 4.0, tightens up


HTML to match XML syntax
 Requires ending tags, quoted attributes, lower
case, etc., to conform to XML requirements

 XHTML is a W3C specification, redefining HTML


as an XML implementation, rather than an
SGML implementation
 Imposes requirements that are intended to lead
to more well-formed, valid XML, easier for
browsers to handle
5
<link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /> An XHTML
<link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" /> Example
<meta name="DC.title" content="Using Dublin Core" />
<meta name="DC.creator" content="Diane Hillmann" />
<meta name="DC.subject" content="documents; Bibliography; Model; meta; Glossary; mark; matching;
refinements; XHTML; Controlled; Qualifiers; Hillmann; mixing; encoding; Diane; Issues; Appendix; elements;
Simple; Special; element; trademark/service; DCMI; Dublin; pages; Section; Resource; Grammatical; Qualified;
XML; Using; Principles; Documents; licensing; OCLC; formal; Usageguide; Roles; Implementing; Contents;
Guidelines; Expressing; Table; Syntax; Content; Element; DC.dot; Home; document; Metadata; RDF/XML;
Website; metadata; privacy; schemes; liability; profiles; Elements; Copyright; Localization; schemas;
HTML/XHTML; Core; Guide; registry; Research; contact; Scope; Projects; languages; Maintenance; Application;
available; Internationalization; HTML; Recommended; link; Purpose; Abstract; AskDCMI; Vocabularies; software;
Storage; Introduction" />
<meta name="DC.description" content="This document is intended as an entry point for users of Dublin Core. For
non-specialists, it will assist them in creating simple descriptive records for information resources (for example,
electronic documents). Specialists may find the document a useful point of reference to the documentation of
Dublin Core, as it changes and grows." />
<meta name="DC.publisher" content="Dublin Core Metadata Initiative" />
<meta name="DC.type" scheme="DCTERMS.DCMIType" content="Text" />
<meta name="DC.format" content="text/html" />
<meta name="DC.format" content="31250 bytes" />
<meta name="DC.identifier" scheme="DCTERMS.URI" content="http://dublincore.org/documents/usageguide/" />

6
XML

 Extensible Markup Language


 A ‘metamarkup’ language: has no fixed
tags or elements
 Strict grammar imposes structure
designed to be read by machines
 Two levels of conformance:
 well-formed--conforms to general grammar rules
 valid--conforms to particular XML schema or DTD
(document type definition)

7
XML: Extensible Markup
Language
 A technical approach to convey meaning with data
 Not a natural language, although uses natural languages
 < 姓名 >Louis Armstrong</ 姓名 >
 <name>Louis Armstrong</name>
 Not a programming language
 Language in the sense of:
 A limited set of tags defines the elements that can be
used to markup data
 The set of tags and their relationships need to be
explicitly defined (e.g., in XML schema)
 We can build software that uses XML as input and
processes them in a meaningful way
 You can define your own markups and schemas
8
XML is the lingua franca of
the Web
 Web pages increasingly use at least XHTML
 Business use for data exchange/ messaging
 Family of technologies can be leveraged
 XML Schema, XSLT, XPath, and XQuery

 Software tools widely available (open source)


 Storage, editing, parsing, validating,
transforming and publishing XML
 Microsoft Office 2003 supports XML as document
format (WordML and ExcelML)
 Web 2.0 applications based on XML (AJAX,
Semantic Web, Web Services, etc.) 9
An XML Schema may
define:
 What elements may be used
 Of which types
 Any attributes
 In which order
 Optional or compulsory
 Repeatability
 Subelements
 Logic

10
Anatomy of an XML Record

 XML declaration--prepares the processor to work


with the document and states the XML version
 Namespaces (uses xmlns:prefix and a URI to attach
a prefix to each element and attribute)
 Distinguishes between elements and attributes from
different vocabularies that might share a name (but
not necessarily a definition) using association with
URIs
 Groups all related elements from an application so
software can deal with them
 The URIs are the standardized bit, not the prefix, and
they don’t necessarily lead anywhere useful, even if
they look like URLs
11
XML Namespaces

XML Namespace Namespace Identifier

xmlns:dc=”http://purl.org/dc/elements/1.1/”

Namespace Prefix

12
XML Anatomy Lesson

Name Attribute Content

<mods:genre authority=“marcgt”>bibliography /</mods:genre>

Start Tag End Tag

13
XML Validation

XML Instance Valid

Validator

Invalid
XML Schema

14
XML Schema Example
<xs:element name="software" minOccurs="0"
maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="swName" minOccurs="1"
maxOccurs="1" type="xs:string"></xs:element>
<xs:element name="swVersion" minOccurs="0"
maxOccurs="1" type="xs:string"></xs:element>
<xs:element name="swType" minOccurs="1"
maxOccurs="1" type="xs:string"></xs:element>
<xs:element name="swOtherInformation"
minOccurs="0" maxOccurs="unbounded" type="xs:string">
</xs:element>
<xs:element name="swDependency"
minOccurs="0" maxOccurs="unbounded" type="xs:string">
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
15
Will the following XML
instance validate?
<software>
  <swName>Windows</swName>
  <swVersion>2000</swVersion>
  <swType>Operating System</swType>
</software>

How about this?

<swVersion>2000</swVersion>

16
Resource Description
Framework
 A language for describing resources on the Web
 Structure based on “triples”
 Designed to be read by computers, not humans
 An ontology language to support semantic interoperability
—understanding meanings
 Considered an essential part of the Semantic Web
 Can be expressed using XML
Predicate
Subject Object

http://www.w3.org/RDF

17
Some RDF Concepts

 A Resource is anything you want to


describe
 A Class is a category; it is a set that
comprises individuals
 A Property is a Resource that has a
name, such as "creator" or "homepage"
 A Property value is the value of a
Property, such as “Barack Obama" or
"http://dublincore.org" (note that a
property value can be another resource)
18
RDF Statements

 The combination of a Resource, a Property,


and a Property value forms a Statement
(known also as the subject, predicate and object
of a Statement), also known as “triples”
 An example Statement: "The editor of
http://dublincore.org/documents/usageguide/ is
Diane Hillmann"
http://dublincore.org/documents/usageguide/
The subject of the statement above is:

 The predicate is: editor


 The object is: Diane Hillmann
19
RDF and OWL

 RDF does not have the language to


specify all relationships
 Web Ontology Language (OWL) can
specify richer relationships, such as
equivalence, inverse, unique
 RDF and OWL may be used together
 RDFS: a syntax for expressing
relationships between elements

20
An RDF/XML Example

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://www.dlib.org">
<dc:title>D-Lib Program - Research in Digital Libraries</dc:title>
<dc:description>The D-Lib program supports the community of people
with research interests in digital libraries and electronic
publishing.</dc:description>
<dc:publisher>Corporation For National Research Initiatives</dc:publisher>
<dc:date>1995-01-07</dc:date>
<dc:subject>
Note
<rdf:Bag> unordered list
<rdf:li>Research; statistical methods</rdf:li>
<rdf:li>Education, research, related topics</rdf:li>
<rdf:li>Library use Studies</rdf:li>
</rdf:Bag>
</dc:subject>
<dc:type>World Wide Web Home Page</dc:type>
<dc:format>text/html</dc:format>
<dc:language>en</dc:language>
</rdf:Description>
</rdf:RDF>

21
Overview of container formats

 A container format is needed to package


together all forms of metadata and digital
content
 Use of a container is compatible with and an
implementation of the OAIS information
package concept
 METS: packages metadata with objects or
links to objects and defines structural
relationships
 MPEG 21 DID: represents digital objects using
a flexible and expressive model
22
Metadata Encoding &
Transmission Standard (METS)

 Developed by the Digital Library Federation,


maintained by the Library of Congress
 “... an XML document format for encoding metadata
necessary for both management of digital library
objects within a repository and exchange of such
objects between repositories (or between repositories
and their users).”
 Records the (possibly hierarchical) structure of digital
objects, the names and locations of the files that
comprise those objects, and the associated metadata

http://www.loc.gov/standards/mets/

23
METS Usage

 To package metadata with digital object in


XML syntax
 For retrieving, storing, preserving, serving
resource
 For interchange of digital objects with
metadata
 As information package in a digital
repository (may be a unit of storage or a
transmission format)

24
Characteristics of METS

 Open non-proprietary standard


 Extensible
 Modular
 Developed by the digital library
community

25
METS Sections

Defined in METS schema for navigation & browsing


1. Header (XML Namespaces)
2. File inventory,
3. Structural Map & Links
4. Descriptive Metadata (not part of METS but uses
an externally developed descriptive metadata
standard, e.g. MODS)
5. Administrative Metadata (points to external
schemas):
1. Technical, Source
2. Digital Provenance
3. Rights

26
The structure of a METS file

METS
fileSec file inventory

dmdSec descriptive metadata

amdSec administrative metadata

behaviorSec behaviour metadata

structMap
structural map 27
Linking in METS Documents
(XML ID/IDREF links)

DescMD
mods
relatedItem
AdminMD
relatedItem
techMD
sourceMD
digiprovMD
rightsMD

fileGrp
file
file
StructMap
div
div
fptr
div
fptr
28
Linking in METS Documents
(XML ID/IDREF links)

DescMD
mods
relatedItem
AdminMD
relatedItem
techMD
sourceMD
digiprovMD
rightsMD

fileGrp
file
file
StructMap
div
div
fptr
div
fptr
29
Linking in METS Documents
(XML ID/IDREF links)

DescMD
mods
relatedItem
AdminMD
relatedItem
techMD
sourceMD
digiprovMD
rightsMD

fileGrp
file
file
StructMap
div
div
fptr
div
fptr
30
Linking in METS Documents
(XML ID/IDREF links)

DescMD
mods
relatedItem
AdminMD
relatedItem
techMD
sourceMD
digiprovMD
rightsMD

fileGrp
file
file
StructMap
div
div
fptr
div
fptr
31
Linking in METS Documents
(XML ID/IDREF links)

DescMD
mods
relatedItem
AdminMD
relatedItem
techMD
sourceMD
digiprovMD
rightsMD

fileGrp
file
file
StructMap
div
div
fptr
div
fptr
32
METS extension schemas

 “wrappers” or “sockets” where elements from


other schemas can be plugged in
 Provides extensibility
 Uses the XML Schema facility for combining
vocabularies from different Namespaces
 Endorsed extension schemas:
 Descriptive: MODS, DC, MARCXML
 Technical metadata: MIX (image); textMD (text)
 Preservation related: PREMIS

33
Descriptive Metadata Section
(dmdSec)

Two methods: Reference and Wrap

<mets>
<dmdSec></dmdSec>
<fileSec></fileSec>
<structMap></structMap>
</mets>

34
METS examples

 METS with MODS


 Recorded event

 METS with MODS, PREMIS and MIX


 Portrait of Louis Armstrong (XML)
 Portrait of Louis Armstrong
(presentation)

35
MPEG-21 Digital Item
Declaration (DID)
 ISO/IEC 21000-2: Digital Item Declaration
 An alternative to represent Digital Objects
 Starting to get supported by some repositories,
e.g., aDORe, DSpace, Fedora
 A flexible and expressive model that easily
represents compound objects (recursive “item”)
 MPEG DID is an ISO standard and has industry
support, but is often implemented in a proprietary
way and standards development is closed; METS
is open source and developed by open discussion,
mainly cultural heritage community

36
Abstract Model for
MPEG-21 DID

container: grouping of items and container


descriptor/statement constructs pertaining to the
container

item: represents a Digital Item aka Digital Object aka


asset. Descriptor/statement constructs convey descriptor/statement item
information about the Digital Item

component: binding of descriptor/statements to


datastreams descriptor/statement item

resource: datastream

descriptor/statement component component

descriptor/statement resource resource resource

37
Exercise

 Encode your resource in DC and


MODS using XML
 Use the template forms provided

38
5. Applying metadata
standards: Application
profiles
Metadata Standards and
Applications Workshop
Goals of Session

 Learn how metadata standards are


applied and used:
 Learn about the concept and use of
application profiles
 Learn about how different metadata
standards may be used together in
digital library applications

2
Overview of session

 Use of Application profiles


 Dublin Core
 METS

 MODS

 Case study: using metadata


standards together based on an
application profile

3
Why Application profiles?

 Describes the set of metadata elements,


policies, and guidelines defined for a
particular application, implementation, or
object type
 Declares the metadata terms an organization,
information resource, application, or user
community uses in its metadata
 Documents metadata standards used in
instances, including schemas and controlled
vocabularies, policies, required elements, etc.
 Called “application profile” or just “profile”
4
Function of Application
Profiles
 Many metadata standards are sufficiently flexible
that they need a mechanism to impose some
constraints
 Profiles allow expression of the decisions made for a
project in machine-readable form (XML or RDF)
 Profiles allow for enforcing those decisions
 This facilitates interoperability and common practices
 Refining
 A narrower interpretation of a standard to suit your
project
 Combining
 Mixing elements from various different standards

5
Components of an
Application Profile
 Human readable documentation
 Property descriptions and relationships
 Domain or project specific instruction
 Obligation and constraints
 Machine-readable versions may contain:
 Specific encoding decisions and XML or RDF schemas
 Models of data relationships specific to the AP
represented in the schemas
 Functional requirements and use cases supporting
decisions

6
Using Properties from
other Schemas
 DC APs set stringent requirements for
determining reusability of terms:
 Is the term a real “property” and defined as such
within the source schema?
 Is the term declared properly, with a URI and
adequate documentation and support?
 In general, properties whose meaning is partly or
wholly determined by its place in a hierarchy are
not appropriate for reuse in DC APs without
reference to the hierarchy.
 Other styles of profiles have different
requirements and strategies for developing
machine-readability and validation
7
Documenting new
properties
 Minimum: a web page, with the
relevant information available to
other implementations
 Better: a web page and an
accessible schema using your terms
as part of your application profile
 Best: all terms available on a
distributed registry
8
Singapore Framework

 A Framework for designing metadata


applications for maximum interoperability
 Defines a set of descriptive components that
are necessary for documenting an Application
Profile
 Forms a basis for reviewing Dublin Core
application profiles
 Relates APs to standard domain models and
Semantic Web standards
 http://dublincore.org/documents/singapore-framework/

9
DC Application Profile
Examples
 Collections AP
 http://www.dublincore.org/groups/collections/
collection-application-profile/2007-03-09/
 Scholarly Works Application Profile (SWAP)
 http://www.ukoln.ac.uk/repositories/digirep/
index/Eprints_Application_Profile
 Both these have been reviewed by the DC
Usage Board and are deemed compliant
with the DC Abstract Model

10
An RDA Application Profile

 A DCMI/RDA Task Group has been defining


RDA properties and value vocabularies as
formal RDF vocabularies (with URIs)
 IFLA has stated an intention to declare FRBR
entities and attributes as well
 Next step is a DC application profile of RDA
according to the Singapore Framework
 See http://metadataregistry.org for the
provisionally registered properties/vocabularies

11
METS Profiles
 Description of a class of METS documents
 provides document authors and programmers
guidance to create and process conformant METS
documents
 XML document using a schema
 Expresses the requirements that a METS document
must satisfy
 “Data standard” in its own right
 A sufficiently explicit METS Profile may be
considered a “data standard”
 METS Profiles are output in human-readable prose
and not intended to be “machine actionable” (but they
use a standard XML schema) 12
Components of a METS
Profile
1. Unique URI 7. Extension schemas
2. Short Title 8. Rules of description
3. Abstract 9. Controlled
vocabularies
4. Date and time of
10. Structural
creation requirements
5. Contact Information 11. Technical
6. Related profiles requirements
12. Tools and
applications
13. Sample document
13
Case study of a METS
Profile
 LC Audio Compact Disc Profile
 Features:
 Specifies MODS for descriptive metadata
 Specifies description rules as AACR2
 Specifies controlled vocabularies used in various
elements
 dmdSec requirements 2 and 3 specify use of
relatedItem type=“constituent” if there are multiple
works on the CD
 Specifies how to detail the physical structure,
whether multiple CDs or multiple tracks on a CD
(structMap requirements 2 and 3)

14
MODS Profiles

 Some applications are establishing MODS


profiles to document usage, required
elements, controlled vocabularies used,
etc.
Some examples:
 DLF Aquifer MODS profile: to establish
implementation guidelines for rich shared
metadata for cultural heritage materials
 British Library electronic journal MODS
profile
15
Using metadata standards
together: a case study
 METS can be used to package together
the metadata with the objects
 METS allows for use of any XML metadata
schema in its extensions
 MODS works well with METS for descriptive
metadata and can be associated with any
level of the description
 Technical metadata can be inserted and
associated with specific files
 METS can be used as a digital library
application if objects are based on a
profile and thus are consistent
16
<dmdSec> with MODS Extension Schema
<mets:mets> Descriptive metadata
… section
<mets:dmdSec> MODS data contained
<mets:mdWrap> inside the metadata
<mets:xmlData> wrap section
<mods:mods></mods:mods>
</mets:xmlData>
</mets:mdWrap> Use of prefixes before element
</mets:dmdSec> names to identify schema

</mets:mets>
17
<mods:mods>
<mods:titleInfo>
<mods:title>Bernstein conducts Beethoven </mods:title>
</mods:titleInfo>
<mods:name>
<mods:namePart>Bernstein, Leonard</mods:namePart>
</mods:name>
<mods:relatedItem type="constituent">
<mods:titleInfo>
<mods:title>Symphony No. 5</mods:title>
</mods:titleInfo>
<mods:name>
<mods:namePart>Beethoven, Ludwig van</mods:namePart>
</mods:name>
<mods:relatedItem type="constituent">
<mods:titleInfo>
<mods:partName>Allegro con moto</mods:partName>
</mods:titleInfo>
</mods:relatedItem>
<mods:relatedItem type="constituent">
<mods:titleInfo>
<mods:partName>Adagio</mods:partName>
</mods:titleInfo>
</mods:relatedItem>
</mods:relatedItem>
</mods:mods> 18
Use of MODS relatedItem
type=“constituent”
 A first level child element to MODS
 relatedItem element uses MODS content model
 titleInfo, name, subject, physicalDescription,
note, etc.
 Makes it possible to create rich analytics for
contained works within a MODS record
 Repeatable and nestable recursively
 Making it possible to build a hierarchical tree
structure
 Makes it possible to associate descriptive data
with any structural element
19
METS 2 Hierarchies: Logical & Physical
<mets:mets>
<mets:dmdSec> Hierarchy to represent
<mets:mdWrap> “logical” structure (nested
<mets:xmlData> relatedItems)
<mods:mods>
<mods:relatedItem>
<mods:relatedItem></mods:relatedItem>
</mods:relatedItem>
</mods:mods>
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<mets:fileSec></mets:fileSec>
<mets:structMap> Hierarchy to represent
<mets:div>
<mets:div></mets:div> “physical” structure (nested
</mets:div> div elements)
</mets:structMap>
20
</mets:mets>
Multiple Inputs to Common Data Format

New Digital Legacy


Objects Database

Harvest of American
A common data Memory Objects
Profile-based
format for searching METS
and display Object

21
Example: Using a profile
as an application
 METS Photograph Profile
 William P. Gottlieb Collection
Portrait of Louis Armstrong
 Photographic object
Convert file of 1600 MARC records,
using marc4j, to XML
modsCollection (single file).
Used XSLT stylesheet to create 1600
records conforming to the
METS photograph profile.

22
Logical & Physical Relationships
Logical (MODS) div TYPE=“photo:version” elements
correspond to the 3 nodes using a logical
<mods:mods ID="ver01"> sequence of ID to DMDID relationships
<mods:titleInfo>
<mods:title>Original Work</mods:title> Physical (METS structMap)
</mods:titleInfo>
<mods:relatedItem type="otherVersion" ID="ver02"> <mets:structMap>
<mets:div TYPE="photo:photoObject“
<mods:titleInfo>
DMDID="MODS1">
<mods:title>Derivative Work 1</mods:title>
<mets:div TYPE="photo:version" DMDID="ver01">
</mods:titleInfo>
<mets:div TYPE="photo:image">
</mods:relatedItem>
<mets:fptr FILEID="FN10081"/>
<mods:relatedItem type="otherVersion" ID="ver03"> </mets:div>
<mods:titleInfo> </mets:div>
<mods:title>Derivative Work 2</mods:title> <mets:div TYPE="photo:version" DMDID=“ver02">
</mods:titleInfo> <mets:div TYPE="photo:image">
</mods:relatedItem> <mets:fptr FILEID="FN10090"/>
</mods:mods> </mets:div>
<mets:div TYPE="photo:version" DMDID="ver03">
<mets:div TYPE="photo:image">
mods:mods and <mets:fptr FILEID="FN1009F"/>
</mets:div>
mods:relatedItem type ="otherVersion" </mets:div>
elements create a sequence of 3 nodes </mets:div>
</mets:div>
</mets:structMap>
23
Using a METS profile-based
approach
 Ability to model complex library objects
 Use of open source software tools
 Use of XML for data creation, editing, storage and
searching
 Use of XSLT for…
 Legacy data conversion
 Batch METS creation and editing
 Web displays and behaviors
 Creation of multiple outputs from XML
 HTML/XHTML for Web display; PDF for printing
 Ability to aggregate disparate data sources into a
common display
24
Closing thoughts on
application profiles
 Many metadata standards are sufficiently
flexible that profiling is necessary
 Documenting what is used in an application
will simplify and enhance data presentation,
conversion from other sources, ability to
provide different outputs
 Constraining a metadata standard by
specifying what is used and how facilitates
data exchange and general interoperability

25
Exercise: critique an
application profile
 University of Maryland Descriptive
Metadata
http://www.lib.umd.edu/dcr/publications/taglibra
ry/umdm.html
 UVa DescMeta
http://lib.virginia.edu/digital/metadata/descriptiv
e.html
 Texas Digital Library profile for electronic
theses and dissertations
http://www.tdl.org/documents/
ETD_MODS_profile.pdf

26
Exercise: Questions to address
 Does the profile define its user community and
expected uses?
 How usable would the profile be for a potential
implementer?
 How (well) does the profile specify element/term
usage?
 How (well) does the profile define and manage
controlled vocabularies?
 Does the profile use existing metadata standards?
 Are there key anomalies, omissions, or
implementation concerns?

27
6. Controlled vocabularies
Metadata Standards and
Applications Workshop
Goals of Session

 Understand how different controlled vocabularies are


used in metadata
 Learn about relationships between terms in thesauri
 Understand methods of encoding vocabularies
 Learn about how registries are used to document
vocabularies

2
Why controlled vocabularies?

 Document values that occur in


metadata
 Goal is to reduce ambiguity
 Allow for control of synonyms
 Establish formal relationships among
terms (where appropriate)
 Test and validate terms
 Role of metadata registries

3
Why bother?

 To improve retrieval, i.e., to get an


optimum balance of precision and
recall
 Precision – How many of the retrieved
records are relevant?
 Recall – How many of the relevant
records did you retrieve?

4
Improving Recall and
Precision
 Controlled Vocabularies improve recall by
addressing synonyms [attire vs. dress vs.
clothing]

 Controlled Vocabularies improve precision


by addressing homographs [bridge
(game) vs. bridge (structure) vs. bridge
(dental device)]

5
Types of Controlled
Vocabularies
 Lists of enumerated values
 Synonym rings
 Taxonomy
 Thesaurus
 Classification Schemes
 Ontology

6
Lists

A list is a simple group of terms


Example:
Alabama
Alaska
Arkansas
California
Colorado
....
Frequently used in Web site pick lists and
pull down menus

7
Synonym Rings
 Synonym rings are used to expand queries for
content objects
 If a user enters any one of these terms as a query to the
system, all items are retrieved that contain any of the terms
in the cluster
 Synonym rings are often used in systems where the
underlying content objects are left in their
unstructured natural language format
 the control is achieved through the interface by drawing
together similar terms into these clusters
 Synonym rings are used in conjunction with search
engines and provide a minimal amount of control of
the diversity of the language found in the texts of the
underlying documents

8
Taxonomies

A taxonomy is a set of preferred


terms, all connected by a hierarchy
or polyhierarchy
Example:
Chemistry
Organic chemistry
Polymer chemistry
Nylon
Frequently used in web navigation
systems
9
Thesauri

A thesaurus is a controlled vocabulary


with multiple types of relationships
Example:
Rice
UF paddy
BT Cereals
BT Plant products
NT Brown rice
RT Rice straw

10
Ontology

 One definition: “An arrangement of


concepts and relations based on an
underlying model of reality.”
 Ex.: Organs, symptoms, and diseases
in medicine
 No real agreement on definition—
every community uses the term in a
slightly different way

11
Thesaural Relationships

Relationship types:
 Use/Used For – indicates preferred term
 Hierarchy – indicates broader and
narrower terms
 Associative – almost unlimited types of
relationships may be used

It is the most complex format for


controlled vocabularies and widely used.

12
13
Z39.19 Types of Concepts

 Things and their physical parts


 Materials
 Activities or processes
 Events or occurrences
 Properties or states of persons, things,
materials or actions
 Disciplines or subject fields
 Units of measurement
 Unique entities
14
Examples

 Birds (things)
 Ornithology (discipline)
 Feathers (materials)
 Flying (activity or process)
 Bird counts (event)
 Barn Owl (unique entity)

15
Relationships

 Equivalence

 Hierarchical

 Associative

16
Equivalence Relationships

Term A and Term B overlap


completely

A=B

17
Hierarchical Relationships

 Term A is included in Term B

A
B

18
Associative Relationships

 Semantics of terms A and B overlap

A B

19
Expressing Relationship

Relationship Rel. Indicator Abbreviation


Equivalence Use None or U
(synonymy) Used for UF
Hierarchy Broader term BT
Narrower term NT
Association Related term RT

20
Vocabulary Management

 The degree of control over a vocabulary is


(mostly) independent of its type
 Uncontrolled – Anybody can add anything at any time
and no effort is made to keep things consistent

 Managed – Software makes sure there is a list that is


consistent (no duplicates, no orphan nodes) at any one
time. Almost anybody can add anything, subject to
consistency rules

 Controlled – A documented process is followed for the


update of the vocabulary. Few people have authority to
change the list. Software may help, but emphasis is on
human processes and custodianship
21
Informal Vocabularies

 New movement towards ‘bottom up’


classification goes by many names:
 Tagging
 Social bookmarking
 Folksonomies

 Some in this movement, seeing problems


of scale, are moving towards more
formalization—others are reframing the
vocabulary issue
22
Libraries/Museums and
Tagging
 Penn Tags
 Still experimental, primarily internal to Penn
 http://tags.library.upenn.edu/help/
 Library of Congress Flickr project
 Open public tagging, still unclear how results will be used
 http://www.flickr.com/photos/library_of_congress/
 The Art Museum Social Tagging Project
 Research/software project focused on museum
application
 http://www.steve.museum/

23
Encoding Controlled
Vocabularies
 MARC 21
 Authority Format used for names, subjects,
series
 Classification Format used for formal
classification schemes
 MADS (a derivative of MARC)
 Simple Knowledge Organization System
(SKOS)
 Intended for concepts

24
New/Upcoming
Standards:Authorities
 Functional Requirements for Authority Data (FRAD)
 A new model for authority information
 Developed by the IFLA Working Group on Functional
Requirements and Numbering of Authority Records
(FRANAR)
 VIAF (Virtual International Authority File)
 Prototype at: http://orlabs.oclc.org/viaf/
 A Review of the Feasibility of an International
Authority Data Number (ISADN)
 Simple Knowledge Organization System (SKOS)—a
W3C standard

25
Functions of the Authority File

 Document decisions
 Serve as reference tool
 Control forms of access points
 Support access to bibliographic files
 Link bibliographic and authority
files
(Slide from Glenn Patton)

26
FRANAR Concept Model, top

27
FRANAR Concept Model, bottom

28
FRAD person attributes
From FRBR (AACR2 additions to names):
Dates associated with the person
Title of person
Other designation associated with the person
New:
Gender
Place of birth
Place of death
Country
Place of residence
Affiliation
Address
Language of person
Field of activity
Profession/occupation
Biography/history
(Slide from Ed Jones)

29
SKOS

 “SKOS Core provides a model for


expressing the basic structure and
content of concept schemes such as
thesauri, classification schemes, subject
heading lists, taxonomies, 'folksonomies',
other types of controlled vocabulary, and
also concept schemes embedded in
glossaries and terminologies.”
--SKOS Core Guide

30
SKOS & RDF

 A World Wide Web Consortium


(W3C) standard
 Based on RDF and OWL
 Data linked to and/or merged with
other data
 Data sources distributed across the
web
 http://www.w3.org/2004/02/skos/

31
The skos:Concept class allows you to assert that a
resource is a conceptual resource.
That is, the resource is itself a concept.

32
Preferred and Alternative Lexical
Labels

33
The RDF/XML Encoded
Version

34
Example of ISO 639-2 language code in SKOS
<rdf:Description rdf:about=
"http://www.loc.gov/standards/registry/vocabulary/iso639-2/por">
<rdf:type rdf:resource="http://www.w3.org/2008/05/skos #Concept"/>
<skos:prefLabel xml:lang="x-notation">por</skos:prefLabel>
<skos:altLabel xml:lang="en-latn">Portuguese</skos:altLabel>
<skos:altLabel xml:lang="fr-latn">portugais</skos:altLabel>
<skos:notation rdf:datatype="xs:string">por</skos:notation>
<skos:definition xml:lang="en-latn">This Concept has not yet been
defined.</skos:definition>
<skos:inScheme rdf:resource=
"http://www.loc.gov/standards/registry/vocabulary/iso639-2"/>
<vs:term_status>stable</vs:term_status>
<skos:historyNote rdf:datatype="xs:dateTime">2006-07-
19T08:41:54.000-05:00</skos:historyNote>
<skos:exactMatch rdf:resource=
"http://www.loc.gov/standards/registry/vocabulary/iso 639-1/pt"/>
<skos:changeNote rdf:datatype="xs:dateTime">2008-07-
09T13:49:05.321-04:00</skos:changeNote>
</rdf:Description>
35
Registries: the Big Picture

(Adapted from Wagner & Weibel, “The Dublin Core Metadata Registry:
Requirements, Implementation, and Experience” JoDI, 2005)

36
Why Registries?

 Support interoperability
 Discovery of available schemes and schemas
for description of resources
 Promote reuse of extant schemes and
schemas
 Access to machine-readable and human-
readable services
 Support for crosswalking and translation
 Coping with different metadata schemes

37
Declaration, documentation,
publication
 To identify the source of a
vocabulary, e.g., a term comes from
LCSH, as identified in my metadata
by a URI
 To clarify a term and its definition
 To publish controlled vocabularies
and have access to information
about each term
38
Some uses for registries

 Metadata Schemas
 Crosswalks between metadata schemas
 Controlled Vocabularies
 Mappings between vocabularies
 Application Profiles
 Schema and vocabulary information in
combination

39
Metadata registries

 Some are formal, others are informal lists


 Some formal registries:
 Dublin Core registry of DC terms
 NSDL registry of vocabularies used
 Experiment at:
http://sandbox.metadataregistry.org
 LC is establishing registries
 MARC and ISO code lists
 Enumerated value lists
 LCSH in SKOS (example:
http://id.loc.gov/authorities/sh85118553)

40
Example from Dublin Core Registry—Term Level

41
7. Approaches to Models of
Metadata Creation, Storage and
Retrieval

Metadata Standards
and Applications
Goals of Session

 Understand the differences between


traditional vs. digital library
 Metadata Creation
 Storage, and

 Retrieval/Discovery

2
Creating metadata records

 The “Library Model”


 Trained catalogers, one-at-a-time metadata
records
 The “Submission Model”
 Authors create metadata when submitting
resources
 The “Automated Model”
 Automated tools create metadata for
resources
 “Combination Approaches”
3
The Library Model

 Records created “by hand,” one at a


time
 Shared documentation and content
standards (AACR2, etc.)
 Efficiencies achieved by sharing
information on commonly held
resources
 Not easily extended past the granularity
assumptions in current practice
4
The Submission Model

 Based on author or user generated


metadata
 Can be wildly inconsistent
 Submitters generally untrained
 May be expert in one area, clueless in
others
 Often requires editing support for usability
 Inexpensive, may not be satisfactory as
an only option
5
The Automated Model

 Based largely on text analysis; doesn’t usually


extend well to non-text or low-text
 Requires development of appropriate
evaluation and editing processes to support
even minimal quality standards
 Still largely research; few large, successful
production examples
 One simple automated tool to try:
http://www.ukoln.ac.uk/metadata/dcdot/
 Automated model may be more efficient for
for technical metadata

6
Combination Approaches

 Combination machine and human


created metadata
 Ex.: LC Web Archives
(http://www.loc.gov/minerva)
 Ex.: INFOMINE
(http://infomine.ucr.edu/)
 Combination metadata and
content indexing
 Ex.: NSDL (http://nsdl.org)
7
Content “Storage” Models

 ‘Storage models’ in this context relates


to the relationships between metadata
and content (not the systems that
purport to ‘store’ content for various
uses)
 These relationships affect how access to
the information is accomplished, and
how the metadata either helps or
hinders the process (or is irrelevant to
it)
8
Common ‘Storage’ Models

 Content with metadata


 Metadata only
 Service only

9
Content with metadata
 Examples:
 HTML pages with embedded ‘meta’ tags

 Most content management systems (though


they may store only technical or structural
metadata)
 Text Encoding Initiative (TEI), ), a full-text
markup language (as an example of an
application, see the Comic Book Markup
Language at http://www.cbml.org/)
 Often proves difficult to update and not
optimized to manage metadata over time
10
Metadata only

 Library catalogs
 Web-based catalogs often provide some
services for digital content
 Electronic Resource Management Systems
(ERMS)
 Provide metadata records for title level only

 Metadata aggregations
 Using API or OAI-PMH for harvest and re-
distribution

11
Service only

 Often supported partially or fully by


metadata
 Google, Yahoo (and others)
 Sometimes provide both search services and
distributed search software
 Electronic journals (article level)
 Linked using ‘link resolvers’ or available
independently from Websites
 Have metadata behind their services but
don’t generally distribute it separately
12
Common Retrieval Models

 Library catalogs
 Web-based (“Amazoogle”)
 Portals and federations

13
Library Catalogs
 Based on a consensus that granular
metadata is useful
 Expectations of uniformity of information
content and presentation
 Designed to optimize recall and precision
 Addition of relevance ranking and keyword
searching of limited value (only ‘text’ used
is the metadata itself)
 Retrieval options limited by LMS vendor
decisions
14
New Library Catalogs
 ENDECA
 North Carolina State University Libraries in 2006, was
one of the first to experiment with new catalog
technologies using legacy metadata

 eXtensible Catalog Project


 University of Rochester attempting to provide a FRBR-
ized catalog and integrated access to previously “silo-
ed” data managed by libraries.

15
http://www.lib.ncsu.edu/
Web-based

 The “Amazoogle” model:


 Lorcan Dempsey: “Amazon, Google, eBay:
massive computational and data platforms which
exercise strong gravitational Web attraction.”
 Based primarily on full-text searching and link-
or usage-based relevance ranking (lots of recall,
little precision)
 Some efforts to combine catalog and Amazoogle
searches (ex.: collaborations with WorldCat)
 Google is using metadata
17
Portals and Federations

 Portals: defined content boundaries


 Some content also available elsewhere
 ex.: Specific library portals, subject portals like
Portals to the World
(http://www.loc.gov/rr/international/portals.ht
ml)

 Federations: protected content and services


 Often specialized services based on specifically
purposed metadata
 ex.: BEN-http://www.biosciednet.org/portal/)

19
XML based digital library
application
 Similar to a portal application
 May use a database for record creation and
maintenance
 Often uses open source tools
 Files are indexed for searching and
presented on the Web using an XML based
publishing framework
 Combines some of the other metadata
creation, storage and retrieval approaches
 http://www.loc.gov/performingarts/ 20
Information Discovery and
Retrieval

 Z39.50

 SRU

 Federated search (Metasearch)

21
Z39.50

 An international (ISO 23950) standard


defining a protocol for computer-to-
computer information retrieval.
 Makes it possible for a user in one system
to search and retrieve information from
other computer systems (that have also
implemented Z39.50)
 Originally approved by the National
Information Standards Organization
(NISO) in 1988
22
SRU
Search/Retrieval via URL
 SRU is the successor to Z39.50
 SRU is a standard XML-focused search
protocol for Internet search queries,
utilizing CQL (Contextual Query
Language), a standard syntax for
representing queries
 To learn more about it see:
http://www.loc.gov/standards/sru/index.
html
24
Federated search
 Some institutions are using
federated search (meta-search) to
search multiple data sources

 LC has a new limited version


available:
http://www.loc.gov/search/new/

25
Can You Tell?

 Can you tell what’s going on behind these sites?


 How are they organized?
 What creation and storage models are used?
 Plant and Insect Parasitic Nematodes:
http://nematode.unl.edu/
 Public Radio Market:
http://www.prms.org/
 Brown University Library Center for Digital Initiatives:
Alcohol, Temperence & Prohibition:
http://dl.lib.brown.edu/temperance/
 Country walkers:
http://www.countrywalkers.com/
26
8. Metadata
Interoperability and Quality
Issues

Metadata Standards and


Applications Workshop
Goals of Session

 Understand interoperability protocols


(OpenURL for reference, OAI-PMH)
 Understand crosswalking and
mapping as it relates to
interoperability
 Investigate issues concerning
metadata quality

2
Tools For Sharing
Metadata/Interoperability
 Protocols
 OpenURL for reference linking
 OAI-PMH for harvesting

 Good practices and


documentation
 Crosswalking

3
What’s the Point of
Interoperability?
 For users, it’s about resource discovery
(user tasks)
 What’s out there?
 Is it what I need for my task?
 Can I use it?
 For resource creators, it’s about distribution
and marketing
 How can I increase the number of people who
find my resources easily?
 How can I justify the funding required to make
these resources available?
4
What’s an OpenURL?

 The OpenURL provides a standardized


format for transporting bibliographic
metadata about objects between
information services
 Provides a basis for building services via
the notion of an extended service-link,
which moves beyond the classic notion of
a reference link (a link from metadata to
the full-content described by the
metadata)

5
Additional Open URL
Services
 Link from a record in an abstracting and indexing
database (A&I) to the full-text described by the
record
 Link from a record describing a book in a library
catalogue to a description of the same book in an
Internet book shop
 Link from a reference in a journal article to a
record matching that reference in an A&I
database
 Link from a citation in a journal article to a record
in a library catalogue that shows the library
holdings of the cited journal
6
OpenURL Examples &
Demo
 http://sfxserver.uni.edu/sfxmenu?
issn=1234-
5678&date=1998&volume=12&issue=2&s
page=134
 An OpenURL demo:
 http://www.ukoln.ac.uk/distributed-
systems/openurl/

7
OAI-PMH
 Open Archives Initiative Protocol for Metadata
Harvesting (http://www.openarchives.org/)
 Roots in the ePrint community, although
applicability is much broader
 Mission: “The Open Archives Initiative
develops and promotes interoperability
standards that aim to facilitate the efficient
dissemination of content.”
 Content in this context is actually “metadata
about content”

8
OAI-PMH in a Nutshell

 Essentially provides a simple protocol for


“harvest” and “exposure” of metadata
records
 Specifies a simple “wrapper” around
metadata records, providing metadata
about the record itself
 OAI-PMH is about the metadata, not about
the resources
ARTstor cdwa-Lite experiment
http://www.artstor.org/index.shtml

9
Metadata
About the
Resource

10
What was OAI-PMH designed
for?
 Way to distribute records to other libraries
 Low barrier to entry for record providers
 Based on
 Records must be in XML

 OAI-PMH supports any metadata format encoded in XML—


Simple Dublin Core is the minimal format specified
 Not Z39.50
 Not a way to support federated search

 No “on-the-fly” sets.

 More like CDS service, but it’s free,


 users “pull” records when they want, at intervals that are
convenient for them (every day, every hour, on any
schedule, or ad hoc) 11
OAI-PMH: Data Provider
 Has records to share
 Runs system that responds to requests
 following protocol
 Advertises base URL from which records
are harvestable
 Just leaves system running
 No human intervention needed to service
requests
 Can control level of activity to protect
performance for primary users
12
OAI-PMH: Service Provider
 Assumed to be providing “union catalog” service
 OAIster { http://www.oaister.org/ }
 or a specialist, value-added service
 Sheet Music Consortium
{http://digital.library.ucla.edu/sheetmusic/ }
 Harvests records, with ability to select limited to
 Records updated in a certain timespan
 Predetermined sets of records (like CDS)
 Known records by identifiers (OAI identifiers, not
LCCNs)

13
14
OAI Best Practices
Activities
 Sponsored by Digital Library Federation (DLF)
 Guidelines for data providers and service
providers
 http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl
 Not just DLF, also NSDL
 Best Practices for Shareable Metadata
 http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?
PublicTOC
 Workshops to encourage DLF members to make
records for their digitized content harvestable
 Also sponsored by IMLS
15
OAI Example
OAI Example
OAI Example
OAI Example
OAI
Example

20
OAIster

 A union catalog of digital


resources. Provides access to digital
resources by "harvesting" their
descriptive metadata (records) using
OAI-PMH.
 Currently provides access to
14,900,092 records from 939
contributors.
 http://www.oaister.org/

21
http://www.oaister.org/
Crosswalking

“Crosswalks support conversion projects


and semantic interoperability to enable
searching across heterogeneous
distributed databases. Inherently, there
are limitations to crosswalks; there is
rarely a one-to-one correspondence
between the fields or data elements in
different information systems.”
-- Mary Woodley, “Crosswalks: The Path to Universal
Access?”

24
Crosswalks

 Semantic mapping of elements between


source and target metadata standards
 Metadata conversion specification:
transformations required to convert
metadata record content to another
 Element to element mapping
 Hierarchy and object resolution
 Metadata content conversions
 Stylesheets are created to transform
metadata based on crosswalks
25
Problems With Converted
Records
 Differences in granularity (complex vs.
simple scheme)
 Some data might be lost
 Differences in semantics
 Differences in use of content standards
 Properties may vary (e.g. repeatability)
 Converting may not always be the
solution

26
Example:Mapping MODS:title
to DC:title
 Includes attribute for type of title
 Abbreviated
 Translated
 Alternative
 Uniform
 Other attributes: ID, authority, displayLabel,
xLink
 Subelements: title, partName, partNumber,
nonSort
 Title definition reused by: Subject, Related
Item
27
Mapping MODS:title to
DC:title
 DC has one element refinement:
alternative
 DC title has no substructure; MODS
allows for subelements for partNumber,
partName
 Best practice statement in DC-Lib says
include initial article; MODS parses into
<nonSort>
 MODS can link to a title in an authority
file if desired
28
Metadata Crosswalks

 Dublin Core-MARC
 Dublin Core-MODS
 ONIX-MARC
 MODS-MARC
 EAD-MARC
 EAD-Dublin Core
 Etc.

29
Crosswalks

Library of Congress
http://www.loc.gov/marc/marcdocz.html

MIT
http://libraries.mit.edu/guides/subjects/metadata/
mappings.html

Getty
http://www.getty.edu/research/conducting_research/standards/
intrometadata/crosswalks.html

30
MARC to DC Qualified
http://www.loc.gov/marc/marc2dc.html#qualifiedlist

32
NISO’s Metadata Principles
 1: Good metadata conforms to
community standards in a way that is
appropriate to the materials in the
collection, users of the collection, and
current and potential future uses of the
collection.
 2: Good metadata supports
interoperability.
 3: Good metadata uses authority control
and content standards to describe objects
and collocate related objects

33
NISO’s Metadata Principles
Continued
 4: Good metadata includes a clear statement of
the conditions and terms of use for the digital
object.
 5: Good metadata supports the long-term
curation and preservation of objects in collections.
 6: Good metadata records are objects themselves
and therefore should have the qualities of good
objects, including authority, authenticity,
archivability, persistence, and unique
identification.

34
Quality issues

 Defining quality
 Criteria for assessing quality
 Levels of quality
 Quality indicators

35
Determining and Ensuring
Quality
 What constitutes quality?
 Techniques for evaluating and
enforcing consistency and
predictability
 Automated metadata creation:
advantages and disadvantages
 Metadata maintenance strategies

36
Quality Measurement:
Criteria
 Completeness
 Accuracy
 Provenance
 Conformance to expectations
 Logical consistency and coherence
 Timeliness (Currency and Lag)
 Accessibility

37
Basic Quality Levels

 Semantic structure (“format,”


“schema” or “element set”)
 Syntactic structure (administrative
wrapper and technical encoding)
 Data values or content

38
Quality Indicators: Tier 1

 Technically valid
 Defined technical schema; automatic
validation
 Appropriate namespace declarations
 Each element defined within a namespace;
not necessarily machine-resolvable
 Administrative wrapper present
 Basic provenance (unique identifier, source,
date)

39
Quality Indicators: Tier 2

 Controlled vocabularies
 Linked to publicly available sources of terms
by unique tokens
 Elements defined and documented by a
specific community
 Preferably an available application profile
 Full complement of general elements
relevant to discovery
 Provenance at a more detailed level
 Methodology used in creation of metadata?
40
Quality Indicators: Tier 3

 Expression of metadata intentions based on


documented AP endorsed by a specialized
community and registered in conformance
to a general metadata standard
 Source of data with known history of
updating, including updated controlled
vocabularies
 Full provenance information (including full
source info), referencing practical
documentation

41
Improving Metadata
Quality …
 Documentation
 Basic standards, best practice
guidelines, examples
 Exposure and maintenance of local and
community vocabularies
 Application Profiles

 Training materials, tools,


methodologies

42
Exercise

 Evaluate a small set of machine-


and human-created metadata

43

You might also like