Ms A Powerpoint
Ms A Powerpoint
Applications
Introduction:
Background, Goals,
and Course Outline
Version 2.1, February 2009
2
Cataloging for the 21st
Century
Background for this course:
The first of five courses developed as part
of:
Bibliographic Control of Web Resources: A
Library of Congress Action Plan
Action Item 5.3: Continuing Education (CE)
Continuing Education Implementation Group
(CEIG)
See course Bibliography for citations
3
Cataloging for the 21st Century:
The five CE course components
1. Rules and Tools for Cataloging Internet
resources
2. Metadata Standards and Applications
3. Principles of Controlled Vocabulary and
Thesaurus Design
4. Metadata and Digital Library Development
5. Digital Project Planning and Management
Basics
4
Cataloging for the 21st Century:
CE Course Series Objectives
To equip catalogers to deal with new types of resources and
to recognize their unique characteristics
To equip catalogers to evaluate competing approaches to and
standards for providing access to resources
To equip catalogers to think creatively and work
collaboratively with others inside and outside their home
institutions
To ensure that catalogers have a broad enough understanding
of the current environment to be able to make their local
efforts compatible and interoperable with other efforts
To prepare catalogers to be comfortable with ambiguity and
being less than perfect
To enable practicing catalogers to put themselves into the
emerging digital information environment and to continue to
play a significant role in shaping library services
5
Goals for this Course
6
Course objectives
Increase catalogers’ understanding of metadata for
digital resources
Evaluate competing approaches and standards for
managing and providing access to resources
Enable catalogers to think creatively and work
collaboratively
Increase understanding of current environment to
allow for compatibility among applications
Increase flexibility in utilizing different kinds of
metadata standards
Allow catalogers to use expertise to contribute to
the emerging digital information environment
7
Outline of this course
Session 1. Introduction to Digital Libraries
and Metadata
8
Outline of this course cont.
Session 3. Technical and Administrative
Metadata Standards
9
Outline of this course cont.
Session 6. Controlled Vocabularies
10
1. Introduction to Digital
Libraries and Metadata
2
Traditional vs. Digital
Libraries
5
http://www.worlddigitallibrary.org/project/english/index.html
How does the environment
affect the creation of
metadata?
8
Traditional Libraries
Differences:
Players: New world of metadata not
necessarily led by librarians
Goals: Competition for users critical for
sustainability
Resources: No real basis for understanding
non-technical needs (including metadata
creation and maintenance)
Many levels of content responsibility (or none)
11
Environmental Factors
Similarities
It’s about discovery (and access, and use and
meeting user needs)
Pressure for fast, cheap and “good enough”
(also rich, scalable, and re-usable--is that a
contradiction?)
Wide variety of materials and services
Maintenance needs often overlooked
12
What IS Metadata?
Some possibilities:
Data about data (or data about resources)
Structured information that describes,
explains, locates, and otherwise makes it
easier to retrieve and use an information
resource.”
A management tool
Computer-processible, human-interpretable
information about digital and non-digital
objects
13
“In moving from dispersed digital collections to
interoperable digital libraries, the most important
activity we need to focus on is standards… most
important is the wide variety of metadata standards
[including] descriptive metadata… administrative
metadata…, structural metadata, and terms and
conditions metadata…”
Howard Besser, NYU
14
Metadata standards in digital
libraries
Interoperability and object exchange requires the use of
established standards
Many digital objects are complex and are comprised of
multiple files
XML is the de-facto standard syntax for metadata descriptions
on the Internet
Complex digital objects require many more forms of metadata
than analog for their management and use
Descriptive
Administrative
Technical
Digital provenance/events
Rights/Terms and conditions
Structural
15
Functions of Metadata
16
Types of metadata
Descriptive
Administrative
Technical
Digital provenance
Rights/Access
Preservation
Structural
Meta-metadata
Other?
17
Cataloging and Metadata
18
Some differences between
traditional and digital libraries
Metadata only vs. actual object
Need to understand Web technologies
Types of media
Granularity
User needs
Web services
Digitized vs. born digital
Slide by Brian Surratt
19
One BIG Difference ...
21
Exercise
22
2. Specific metadata
standards: descriptive
Metadata Standards and
Applications Workshop
Session 2 Objectives
2
Outline of Session 2:
descriptive metadata
Types of descriptive metadata standards
(e.g. element sets, content standards)
Relationship models
3
Descriptive metadata
4
Aspects of descriptive
metadata
Data content standards (e.g., rules: AACR2R/RDA,
CCO)
Data value standards (e.g., values/controlled
vocabularies: LCNAF, LCSH, MeSH, AAT)
Data structure standards (e.g., formats/schemes:
DC, MODS, MARC 21)
Set of semantic properties, in this context used to
describe resource
Data exchange/syntax standards (e.g. MARC 21
(ISO 2709), MARCXML, DC/RDF or DC/XML)
The structural wrapping around the semantics
Relationship models
5
Content Standards: Rules
6
Content Standards: Value
Standards/Controlled
Vocabularies
Examples of thesauri
Library of Congress Subject Headings
Art and Architecture Thesaurus
Thesaurus of Geographical Names
7
Data structure standards
(element sets and formats)
Facilitates database creation and record retrieval
Flexibility because not tied to a particular syntax
May provide a minimum of agreed upon
elements that facilitate record sharing and
minimal consistency
Different user communities develop their own
standard data element sets
May differ in complexity and granularity of fields
Some data element sets become
formats/schemes by adding rules such as
repeatability, controlled vocabularies used, etc.
8
Data Structure Standards:
Examples
MARC 21 (http://www.loc.gov/marc/)
/marc/
MODS (www.loc.gov/standards/mods/)
IEEE-LOM (http://ltsc.ieee.org/wg12/)
http://ltsc.ieee.org/wg12/
ONIX (http://www.editeur.org/onix.html)
EAD (http://www.loc.gov/ead/)
9
Data Structure Standards:
Examples, cont.
VRA Core (
http://www.vraweb.org/projects/vracore4/)
PBCore (http://www.pbcore.org/)
TEI (http://www.tei-c.org/index.xml)
10
What is MARC 21?
11
02158cam 2200349Ia
450000100130000000300060001300500170001900600190003600600190005500700150007400800
41000890400020001300200015001500430021001650490009001862450119001952460025
003142600065003395380030004045060038004345360153004725200764006255050094013895000
086014836000049015696500040016186510039016586510023016977000026017208560050017469
94001201796ocm56835268 OCoLC20060118051017.0m d szx w s 0 2cr mn---------
041028m20049999vau st 000 0 eng d aVA@cVA@dOCLCQ a0813922917 an-us---an-us-va
aVA@@04aThe Dolley Madison digital editionh[electronic resource] :bletters 1788-June 1836 /
cedited by Holly C. Shulman. 1iAlso known as:aDMDE aCharlottesville, Va. :bUniversity of Virginia
Press,c2004- aMode of access: Internet. aSubscription required for access. aRotunda editions are
made possible by generous grants from the Andrew W. Mellon Foundation and the President's Office
of the University of Virginia. aDolley Payne Madison was the most important First Lady of the
nineteenth century. The DMDE will be the first-ever complete edition of all of her known
correspondence, gathered in an XML-based archive. It will ultimately include close to 2,500 letters.
From the scattered correspondence were gathered letters that have never been previously published.
The range and scope of the collection makes this edition an important scholarly contribution to the
literature of the early republic, women's history, and the institution of the First Lady. These letters
present Dolley Madison's trials and triumphs and make it possible to gain admittance to her mind and
her private emotions and to understand the importance of her role as the national capital's First
Lady.0 aGeneral introduction -- Biographical introduction -- Introduction to the digital edition. aTitle
from the opening screen; description based on the display of Oct. 21, 2004.10aMadison,
Dolley,d1768-1849vCorrespondence. 0aPresidents' spouseszUnited States. 0aUnited
StatesxHistoryy1801-1809. 0aVirginiaxHistory.1 aShulman, Holly Cowan.40
uhttp://rotunda.upress.virginia.edu:8100/dmde/ aC0bVA@
12
MARC 21 Scope
Bibliographic Data
books, serials, computer files, maps, music, visual
materials, mixed material
Authority Data
names, titles, name/title combinations, subjects, series
Holdings Data
physical holdings, digital access, location
Classification Data
classification numbers, associated captions, hierarchies
Community Information
events, programs, services, people, organizations
13
MARC 21 implementation
National formats were once common and there
were different flavors of MARC
Now most have harmonized with MARC 21 (e.g.
CANMARC, UKMARC, MAB)
Billions of records world wide
Integrated library systems that support MARC
bibliographic, authority and holdings format
Wide sharing of records for 30+ years
OCLC is a major source of MARC records
14
Streamlining MARC 21
into the future
Take advantage of XML
Establish standard MARC 21 in an XML structure
Take advantage of freely available XML tools
Develop simpler (but compatible) alternatives
MODS
Allow for interoperability with different XML
metadata schemas
Assemble coordinated set of tools
Provide continuity with current data
Provide flexible transition options
15
MARC 21 evolution to XML
16
MARC 21 in XML – MARCXML
MARCXML record
XML exact equivalent of MARC (2709) record
Lossless/roundtrip conversion to/from MARC
21 record
Simple flexible XML schema, no need to
change when MARC 21 changes
Presentations using XML stylesheets
LC provides converters (open source)
http://www.loc.gov/standards/marcxml
17
Example: MARC and
MARCXML
18
What is MODS?
20
Advantages of MODS
21
Uses of MODS
Extension schema to METS
Rich description works well with hierarchical METS
objects
To represent metadata for harvesting (OAI)
Language based tags are more user friendly
As a specified XML format for SRU
As a core element set for convergence
between MARC and non-MARC XML
descriptions
For original resource description in XML syntax
that is simpler than full MARC
22
Example: MODS
23
Status of MODS
24
A selection of MODS projects
LC uses of MODS
LC web archives
Digital library METS projects
University of Chicago Library
Chopin early editions
Finding aid discovery
Digital Library Federation Aquifer initiative
National Library of Australia
MusicAustralia: MODS as exchange format between National
Library of Australia and ScreenSoundAustralia
Australian national bibliographic database metadata project
See: MODS Implementation registry
http://www.loc.gov/mods/registry.php
25
What is MADS?
26
MADS elements
authority note
name affiliation
titleInfo
url
topic
temporal
identifier
genre fieldOfActivity
geographic extension
hierarchicalGeographic
recordInfo
occupation
related
same subelements
variant
same subelements
27
Uses of MADS
As an XML format for information about people,
organizations, titles, events, places, concepts
To expose library metadata in authority files
To allow for linking to an authoritative form
and fuller description of the entity from a
MODS record
For a simpler authority record than full MARC 21
authorities
To integrate bibliographic/authority information
for presentation
28
Example: MADS Name Record
<mads xsi:schemaLocation="http://www.loc.gov/mads/ http://www.loc.gov/mads/mads.xsd">
<authority>
<name type=“personal”>
<namePart>Smith,John</namePart>
<namePart type="date">1995-</namePart>
</name>
</authority>
<variant type="other">
<name>
<namePart>Smith, J</namePart>
</name>
</variant>
<variant type="other">
<name>
<namePart>Smith, John J</namePart>
</name>
</variant>
<note type="history">Biographical note about John Smith.</note>
<affiliation>
<organization>Lawrence Livermore Laboratory</organization>
<dateValid>1987</dateValid>
</affiliation>
</mads>
29
Example: MADS Organization Record
<mads xsi:schemaLocation="http://www.loc.gov/mads/
http://www.loc.gov/mads/mads.xsd">
<authority>
<name type=“corporate”>
<namePart>Unesco</namePart>
</name>
</authority>
<related type="parentOrg">
<name>
<namePart>United Nations</namePart>
</name>
</related>
<variant type="expansion">
<name>
<namePart>United Nations Educational, Cultural, and Scientific
Organization</namePart>
</name>
</variant>
</mads>
30
Some MADS implementations
31
Dublin Core: Simple
Simple to use
All elements are optional/repeatable
No order of elements prescribed
Interdisciplinary/International
Promotes semantic interoperability
Controlled vocabulary values may be expressed,
but not the sources of the values
32
Dublin Core Elements
34
DC Structure
35
Advantages: Dublin Core
36
Uses of Dublin Core
37
Ex.: Simple Dublin Core
<metadata>
<dc:title>3 Viennese arias: for soprano, obbligato clarinet in B flat, and
piano.</dc:title>
<dc:contributor>Lawson, Colin (Colin James)</dc:contributor>
<dc:contributor>Bononcini, Giovanni, 1670-1747.</dc:contributor>
<dc:contributor>Joseph I, Holy Roman Emperor, 1678-1711.</dc:contributor>
<dc:subject>Operas--Excerpts, Arranged--Scores and parts</dc:subject>
<dc:subject>Songs (High voice) with instrumental ensemble--Scores and
parts</dc:subject>
<dc:subject>M1506 .A14 1984</dc:subject>
<dc:subject></dc:subject>
<dc:subject></dc:subject>
<dc:date>1984</dc:date>
<dc:format>1 score (12 p.) + 2 parts ; 31 cm.</dc:format>
<dc:type>text</dc:type>
<dc:identifier>85753651</dc:identifier>
<dc:language>it</dc:language>
<dc:language>en</dc:language>
<dc:publisher>Nova Music</dc:publisher></metadata>
38
Ex.: Qualified Dublin Core
<metadata>
<dc:title xml:lang="en">3 Viennese arias: for soprano, obbligato clarinet in B flat,
and piano.</dc:title>
<dc:contributor>Lawson, Colin (Colin James)</dc:contributor>
<dc:contributor>Bononcini, Giovanni, 1670-1747.</dc:contributor>
<dc:contributor>Joseph I, Holy Roman Emperor, 1678-1711.</dc:contributor>
<dc:subject xsitype="LCSH">Operas--Excerpts, Arranged--Scores and
parts</dc:subject>
<dc:subject xsitype="LCSH">Songs (High voice) with instrumental ensemble--
Scores and parts</dc:subject>
<dc:subject xsitype="LCC">M1506 .A14 1984</dc:subject>
<dc:date xsitype="W3CDTF">1984</dc:date>
<dcterms:extent>1 score (12 p.) + 2 parts ; 31 cm.</dcterms:extent>
<dc:type xsitype="DCMIType">Sound</dc:type>
<dc:identifier>85753651</dc:identifier>
<dc:language xsitype="RFC3066">it</dc:language>
<dc:language xsitype="RFC3066">en</dc:language>
<dc:publisher>Nova Music</dc:publisher>
</metadata>
39
Status of DC
40
A selection of DC projects
National Science Digital Library http://nsdl.org/
Aggregates a wide variety of source collections using
Dublin Core
Kentuckiana Digital Library http://kdl.kyvl.org/
For item level metadata, on DLXS software
MusicBrainz http://musicbrainz.org/
User-maintained community music recording
database; extension of DC
41
Encoded Archival
Description (EAD)
Standard for electronic encoding of finding
aids for archival and manuscript collections
Expressed as an SGML/XML DTD
Supports archival descriptive practices and
standards
Supports discovery, exchange and use of
data
Developed and maintained by Society of
American Archivists; LC hosts the website
42
EAD, continued
44
Text Encoding Initiative (TEI)
Consortium of institutions and research
projects which collectively maintains and
develops guidelines for the representation of
texts in digital form.
Includes representation of title pages,
chapter breaks, tables of contents, as well
as poetry, plays, charts, etc.
The TEI file contains a “header” that holds
metadata about the digital file & about the
original source.
45
TEI
<fileDesc>
<titleStmt>
<title type="main">A chronicle of the conquest of
Granada</title>
<author>
<name type="last">Irving</name>
<name type="first">Washington</name>
<dateRange from="1783"
to="1859">1783-1859</dateRange>
</author>
</titleStmt>
<extent>455 kilobytes</extent>
<publicationStmt>
<publisher>University of Virginia Library</publisher>
<pubPlace>Charlottesville, Virginia</pubPlace>
<date value="2006">2006</date>
</publicationStmt>
<availability status="public">
<p n="copyright">Copyright © 2006 by the Rector and
Visitors of the University of Virginia</p>
<p n="access">Publicly accessible</p>
</availability> 46
MORE TEI
<sourceDesc>
<titleStmt>
<title type="main”>A chronicle of the conquest of
Granada</title>
<author>
<name type="last">Irving</name>
<name type="first">Washington</name>
<dateRange from="1783"
to="1859">1783-1859</dateRange>
</author>
</titleStmt>
<extent>345 p. ; 21 cm.</extent>
<publicationStmt>
<publisher>Carey, Lea & Carey</publisher>
<pubPlace>Philadelphia</pubPlace>
<date value="1829">1829</date>
<idno type="LC call number">DP122 .I7 1829a</idno>
<idno type="UVa Title Control Number">a1599744</idno>
</publicationStmt>
</sourceDesc> 47
Selection of TEI projects
48
VRA Core
Maintained by the Visual Resources Association.
Version 4 is currently in beta release.
Consists of a metadata
element set and an
initial blueprint for
how those elements
can be hierarchically
structured.
49
Work, Collection or Image
50
Advantages of VRA
51
VRA Core
<work>
<titleSet>
<title pref="true” source=“LC NAF”>Rotunda</title>
</titleSet>
<agentSet><agent>
<name type="personal“ vocab=“LC NAF” refid= “n 79089957”>
Jefferson, Thomas</name>
<dates type="life">
<earliestDate>1743</earliestDate><latestDate>1826</latestDate>
</dates>
<role>architect</role>
<culture>American</culture>
</agent></agentSet>
<agentSet><agent>
<name type="personal“ vocab=“LC NAF” refid= “n 50020242”>
White, Stanford</name>
<dates type="life">
<earliestDate>1853</earliestDate><latestDate>1906</latestDate>
</dates>
<role>architect</role>
<culture>American</culture>
<notes>Architect of 1896-1897 renovation</notes>
</agent></agentSet>
52
<dateSet>
More VRA Core
<date type="construction">
<earliestDate>1822</earliestDate><latestDate>1826</latestDate>
</date>
<notes>Construction begun October, 1822, completed September, 1826.<notes>
</dateSet>
<dateSet>
<date type=“destruction">
<earliestDate>1895</earliestDate>
</date>
<notes>Burned October 27, 1895.</notes>
</dateSet>
<dateSet>
<date type=“renovation">
<earliestDate>1896</earliestDate><latestDate>1897</latestDate>
</date>
<notes>Rebuilt to designs of Stanford White, 1896-1897.</notes>
</dateSet>
<locationSet><location type="site">
<name type="geographic" vocab="TGN" refid="2002201">
Charlottesville, Virginia</name>
</location></locationSet>
</work>
53
More VRA Core
<image>
<titleSet>
<title type="descriptive">general view</title>
</titleSet>
<agentSet><agent>
<name type="personal“ vocab=“LC NAF”
refid=“n 82111472”>Lay, K. Edward</name>
<culture>American</culture>
<role>photographer</role>
</agent></agentSet>
<dateSet><date type=“creation">
<earliestDate>1990</earliestDate> credit: K. Edward Lay
<latestDate>2000</latestDate>
</date></dateSet>
<locationSet><location type="repository">
<name type="corporate">University of Virginia Library</name>
<name type="geographic" vocab="TGN" refid="2002201">
Charlottesville</name>
</location></locationSet>
<rightsSet>
<rights type=“credit”>K. Edward Lay</rights>
<rights type=“access”>Publicly accessible</rights>
</rightsSet>
</image> 54
More VRA Core
<image>
<titleSet>
<title type="descriptive">View from gymnasia</title>
</titleSet>
<agentSet><agent>
<name type="personal“ vocab=“LC NAF”
refid=“n 82111472”>Lay, K. Edward</name>
<culture>American</culture>
<role>photographer</role>
</agent></agentSet>
<dateSet><date type=“creation">
<earliestDate>1995</earliestDate>
<latestDate>2000</latestDate>
</date></dateSet> credit: K. Edward Lay
<locationSet><location type="repository">
<name type="corporate">University of Virginia Library</name>
<name type="geographic" vocab="TGN" refid="2002201">
Charlottesville</name>
</location></locationSet>
<rightsSet>
<rights type=“credit”>K. Edward Lay</rights>
<rights type=“access”>Publicly accessible</rights>
</rightsSet>
</image> 55
A Selection of VRA Core
Projects
Luna Imaging
http://www.lunaimaging.com/index.html
ARTstor
http://www.artstor.org/
Visual Information Access (VIA), Harvard
University Libraries
http://via.lib.harvard.edu/via/
56
Learning Object Metadata (LOM)
58
A Selection of IEEE-LOM
Projects
CanCore
http://www.cancore.ca/
LearnAlberta.ca
http://www.learnalberta.ca/
Grades K-12
Learning Object Repository Network
http://lorn.flexiblelearning.net.au/Home.aspx
59
What is ONIX for Books?
60
Advantages of ONIX
61
A selection of ONIX projects
http://www.editeur.org/onix.html
ONIX Administrators
EDItEUR (European & international)
Book Industry Communication (BIC) (European and
international)
Book Industry Study Group, Inc. (BISG) (U.S.)
Amazon.com
Association of American Publishers
Baker & Taylor
Barnes & Noble
Google
McGraw-Hill Companies
62
PBCore
64
Uses of PBCore
65
Selection of PBCore
projects
Wisconsin Public Television (WPT)
Media Library Online
http://wptmedialibrary.wpt.org/
Kentucky Educational Television
(KET) http://www.ket.org/
New Jersey Network (NJN)
http://www.njn.net/
66
Modeling metadata: why
use models?
To understand what entities you are
dealing with
To understand what metadata are
relevant to which entities
To understand relationships between
different entities
To organize your metadata to make
it more predictable (and be able to
use automated tools)
67
Descriptive metadata
models
Conceptual models for bibliographic and authority
data
Functional Requirements for Bibliographic Records
(FRBR)
Functional Requirement for Authority Data (FRAD)
Dublin Core Abstract Model (DCAM)
Some other models:
CIDOC Conceptual Reference Model (emerged from museum
community)
INDECS (for intellectual property rights)
There are many conceptual models intended for
different purposes
68
Bibliographic relationships
(pre-FRBR)
Tillett’s Taxonomy (1987)
Equivalence
Derivative
Descriptive
Whole-part
Accompanying
Sequential
Shared-characteristic
69
Bibliographic relationships in
MARC/MODS
MARC Linking entry fields
MARC relationships by specific
encoding format
Authority vs bibliographic vs holdings
MODS relationships
relatedItem types
Relationship to METS document
70
FRBR (1996)
71
FRBR Entities
72
Group 1 Entities and Relationships
Expression An Expression
A Manifestation
“embodies” “Is embodied in”
An Expression A Manifestation
Manifestation
An Item
“exemplifies” A Manifestation
A Manifestation “Is exemplified by”
An Item
Item
74
is instantiated record
as
is grouped
into
description A record consists of descriptions,
set
using properties and values.
description
description
description A value can be a string or a pointer
to another description.
has one statement
or more statement
statement
has one property
is a related
description
75
Basic model: Resource with properties
A Play has the title “Antony and Cleopatra,” was written in 1606
by William Shakespeare, and is about “Roman history”
76
… related to other Resources
77
An Exercise
2
Types of administrative
metadata
Provides information to help manage a resource
Preservation metadata
Technical characteristics
Information about actions on an object
Structural metadata may be considered
administrative; indicates how compound
objects are put together
Rights metadata
3
PREMIS: introduction
4
Preservation metadata
includes: Preservation
Metadata
Provenance:
Who has had custody/ownership of the
Content
digital object?
Authenticity:
10 years on
Is the digital object what it purports to be?
Forever!
Technical Environment:
What is needed to render and use it?
Rights Management:
What IPR must be observed?
May 2005:
Data Dictionary for Preservation Metadata:
Final Report of the PREMIS Working Group
237-page report includes:
PREMIS Data Dictionary 1.0
Accompanying report
Special topics, glossary, usage examples
Data Dictionary: comprehensive, practical resource for
implementing preservation metadata in digital archiving systems
Used Framework as starting point
Detailed description of metadata elements
Guidelines to support implementation, use, management
Based on deep pool of institutional experiences in setting up and
managing operational capacity for digital preservation
Set of XML schema developed to support use of Data Dictionary
6
Scope of data dictionary
Implementation independent
Descriptive metadata out of scope
Technical metadata applying to all or
most format types
Media or hardware details are limited
Business rules are essential for working
repositories, but not covered
Rights information for preservation
actions, not access
7
What PREMIS is and is not
What PREMIS is:
Common data model for organizing/thinking about
preservation metadata
Guidance for local implementations
Standard for exchanging information packages between
repositories
What PREMIS is not:
Out-of-the-box solution: need to instantiate as metadata
elements in repository system
All needed metadata: excludes business rules, format-
specific technical metadata, descriptive metadata for
access, non-core preservation metadata
Lifecycle management of objects outside repository
Rights management: limited to permissions regarding
actions taken within repository 8
PREMIS Data Model
Intellectual
Entities
Rights
Statements
Objects Agents
Events
9
Types of information covered in
PREMIS (by entity type)
Object Event
Event ID
Object ID Event type
Preservation level
Event date/time
Object characteristics Event outcomes
(format, size, etc.) Linking identifiers
Storage Agent
Environment
Agent ID
Agent name
Digital signatures
Rights
Relationships Rights statement
Linking identifiers Granting agent
Permission granted
10
PREMIS Maintenance Activity
http://www.loc.gov/standards/premis/ 11
Current activities
PREMIS Implementers’ Registry
http://www.loc.gov/standards/premis/premis-registry.htm
l
Revision of data dictionary and schemas (March
2008)
Guidelines for use of PREMIS within METS have
been developed
PREMIS tutorials
One or one and a half day tutorials have been
given in several locations: Glasgow, Boston,
Stockholm, Albuquerque, Washington, San Diego, Berlin
Training materials available from LC
12
Why is PREMIS important to
catalogers?
As we take responsibility for more digital
materials, we need to ensure that they can be
used in the future
Most preservation metadata will be generated
from the object, but catalogers may need to
verify its accuracy
Catalogers may need to play a role in assessing
and organizing digital materials
Understanding the structure of complex digital objects
Determining significant properties that need to be
preserved
13
Technical metadata for
images
NISO Z39.87 and MIX
Adobe and XMP
Exif
IPTC (International Press
Telecommunications Council)/XMP
Some of these deal with embedded
metadata in images
14
Metadata For Images in
XML (MIX)
An XML Schema designed for expressing
technical metadata for digital still images
Based on the NISO Z39.87 Data
Dictionary – Technical Metadata for
Digital Still Images
Can be used standalone or as an
extension schema with METS/PREMIS
15
Using MIX
Includes
Characteristics that apply to all or most object types,
e.g. size, format (elements also in PREMIS)
Format specific metadata for images
Some examples of format specific metadata
elements in MIX:
Image width
Color space, color profile
Scanner metadata
Digital camera settings
Most well developed of format specific technical
metadata standards
16
Technical metadata for
textual objects
textMD is an XML Schema designed for
expressing technical metadata for textual
objects
Developed at New York University;
maintenance transferred to LC
Includes format specific technical
metadata for text, e.g.
byte order
character set encoding
font script
17
Technical metadata for
audio and video
Not as well developed as other technical metadata
Complexities of file formats requires expertise to
develop these
LC developed XML technical metadata schemas in
2003/2004 for LC Audiovisual Prototype Project
used with METS; these were widely implemented
because of the lack of other schemas
Audio and video technical metadata schemas under
development by expert organizations
Moving Image Catalog (MIC) project is also
experimenting with these
18
Technical metadata for
multimedia (MPEG-7)
A multimedia content description standard,
associated with the content itself
Intended to allow fast and efficient searching
Formally called Multimedia Content
Description Interface
Does not deal with the actual encoding of
moving pictures and audio (as MPEG-1, MPEG-2
and MPEG-4 do)
intended to provide complementary functionality
to the previous MPEG standards
19
Structural metadata
20
Rights metadata
21
Rights schemas with limited
scope
METS Rights
Access rights for use with METS objects
Rights declarations
Rights holder
Context
CDL copyright schema
Specifically copyrights, not other intellectual property
rights
Information you need to know to assess copyright
status (e.g. creators, rights holders, dates, jurisdiction)
Note that a new field 542 has been added to MARC 21
with information about copyright to help the cataloger
assess the status of the item (based on the CDL work)
22
Rights schemas with limited
scope cont.
PREMIS Rights
Focused on rights for preservation rather than access
Revision of PREMIS data dictionary expanded this area
Allows for extensibility, i.e. inserting another rights
schema
Creative commons
Allows creators to choose a license for their work
Simple rights statements that fit a lot of situations
http://creativecommons.org/
An example: MIC catalog
23
Rights metadata for specific
object types
PLUS for images
MPEG-21 REL for moving images,
etc.
ONIX for licensing terms
Full Rights Expression Languages
XRML/MPEG 21
ODRL
24
Exercise
Provide administrative/technical
metadata for the object used in the
descriptive metadata exercise
25
4. Metadata syntaxes and
containers
Metadata Standards and
Applications Workshop
Goals of session
2
Overview of Syntaxes
3
HTML
6
XML
7
XML: Extensible Markup
Language
A technical approach to convey meaning with data
Not a natural language, although uses natural languages
< 姓名 >Louis Armstrong</ 姓名 >
<name>Louis Armstrong</name>
Not a programming language
Language in the sense of:
A limited set of tags defines the elements that can be
used to markup data
The set of tags and their relationships need to be
explicitly defined (e.g., in XML schema)
We can build software that uses XML as input and
processes them in a meaningful way
You can define your own markups and schemas
8
XML is the lingua franca of
the Web
Web pages increasingly use at least XHTML
Business use for data exchange/ messaging
Family of technologies can be leveraged
XML Schema, XSLT, XPath, and XQuery
10
Anatomy of an XML Record
xmlns:dc=”http://purl.org/dc/elements/1.1/”
Namespace Prefix
12
XML Anatomy Lesson
13
XML Validation
Validator
Invalid
XML Schema
14
XML Schema Example
<xs:element name="software" minOccurs="0"
maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="swName" minOccurs="1"
maxOccurs="1" type="xs:string"></xs:element>
<xs:element name="swVersion" minOccurs="0"
maxOccurs="1" type="xs:string"></xs:element>
<xs:element name="swType" minOccurs="1"
maxOccurs="1" type="xs:string"></xs:element>
<xs:element name="swOtherInformation"
minOccurs="0" maxOccurs="unbounded" type="xs:string">
</xs:element>
<xs:element name="swDependency"
minOccurs="0" maxOccurs="unbounded" type="xs:string">
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
15
Will the following XML
instance validate?
<software>
<swName>Windows</swName>
<swVersion>2000</swVersion>
<swType>Operating System</swType>
</software>
<swVersion>2000</swVersion>
16
Resource Description
Framework
A language for describing resources on the Web
Structure based on “triples”
Designed to be read by computers, not humans
An ontology language to support semantic interoperability
—understanding meanings
Considered an essential part of the Semantic Web
Can be expressed using XML
Predicate
Subject Object
http://www.w3.org/RDF
17
Some RDF Concepts
20
An RDF/XML Example
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://www.dlib.org">
<dc:title>D-Lib Program - Research in Digital Libraries</dc:title>
<dc:description>The D-Lib program supports the community of people
with research interests in digital libraries and electronic
publishing.</dc:description>
<dc:publisher>Corporation For National Research Initiatives</dc:publisher>
<dc:date>1995-01-07</dc:date>
<dc:subject>
Note
<rdf:Bag> unordered list
<rdf:li>Research; statistical methods</rdf:li>
<rdf:li>Education, research, related topics</rdf:li>
<rdf:li>Library use Studies</rdf:li>
</rdf:Bag>
</dc:subject>
<dc:type>World Wide Web Home Page</dc:type>
<dc:format>text/html</dc:format>
<dc:language>en</dc:language>
</rdf:Description>
</rdf:RDF>
21
Overview of container formats
http://www.loc.gov/standards/mets/
23
METS Usage
24
Characteristics of METS
25
METS Sections
26
The structure of a METS file
METS
fileSec file inventory
structMap
structural map 27
Linking in METS Documents
(XML ID/IDREF links)
DescMD
mods
relatedItem
AdminMD
relatedItem
techMD
sourceMD
digiprovMD
rightsMD
fileGrp
file
file
StructMap
div
div
fptr
div
fptr
28
Linking in METS Documents
(XML ID/IDREF links)
DescMD
mods
relatedItem
AdminMD
relatedItem
techMD
sourceMD
digiprovMD
rightsMD
fileGrp
file
file
StructMap
div
div
fptr
div
fptr
29
Linking in METS Documents
(XML ID/IDREF links)
DescMD
mods
relatedItem
AdminMD
relatedItem
techMD
sourceMD
digiprovMD
rightsMD
fileGrp
file
file
StructMap
div
div
fptr
div
fptr
30
Linking in METS Documents
(XML ID/IDREF links)
DescMD
mods
relatedItem
AdminMD
relatedItem
techMD
sourceMD
digiprovMD
rightsMD
fileGrp
file
file
StructMap
div
div
fptr
div
fptr
31
Linking in METS Documents
(XML ID/IDREF links)
DescMD
mods
relatedItem
AdminMD
relatedItem
techMD
sourceMD
digiprovMD
rightsMD
fileGrp
file
file
StructMap
div
div
fptr
div
fptr
32
METS extension schemas
33
Descriptive Metadata Section
(dmdSec)
<mets>
<dmdSec></dmdSec>
<fileSec></fileSec>
<structMap></structMap>
</mets>
34
METS examples
35
MPEG-21 Digital Item
Declaration (DID)
ISO/IEC 21000-2: Digital Item Declaration
An alternative to represent Digital Objects
Starting to get supported by some repositories,
e.g., aDORe, DSpace, Fedora
A flexible and expressive model that easily
represents compound objects (recursive “item”)
MPEG DID is an ISO standard and has industry
support, but is often implemented in a proprietary
way and standards development is closed; METS
is open source and developed by open discussion,
mainly cultural heritage community
36
Abstract Model for
MPEG-21 DID
resource: datastream
37
Exercise
38
5. Applying metadata
standards: Application
profiles
Metadata Standards and
Applications Workshop
Goals of Session
2
Overview of session
MODS
3
Why Application profiles?
5
Components of an
Application Profile
Human readable documentation
Property descriptions and relationships
Domain or project specific instruction
Obligation and constraints
Machine-readable versions may contain:
Specific encoding decisions and XML or RDF schemas
Models of data relationships specific to the AP
represented in the schemas
Functional requirements and use cases supporting
decisions
6
Using Properties from
other Schemas
DC APs set stringent requirements for
determining reusability of terms:
Is the term a real “property” and defined as such
within the source schema?
Is the term declared properly, with a URI and
adequate documentation and support?
In general, properties whose meaning is partly or
wholly determined by its place in a hierarchy are
not appropriate for reuse in DC APs without
reference to the hierarchy.
Other styles of profiles have different
requirements and strategies for developing
machine-readability and validation
7
Documenting new
properties
Minimum: a web page, with the
relevant information available to
other implementations
Better: a web page and an
accessible schema using your terms
as part of your application profile
Best: all terms available on a
distributed registry
8
Singapore Framework
9
DC Application Profile
Examples
Collections AP
http://www.dublincore.org/groups/collections/
collection-application-profile/2007-03-09/
Scholarly Works Application Profile (SWAP)
http://www.ukoln.ac.uk/repositories/digirep/
index/Eprints_Application_Profile
Both these have been reviewed by the DC
Usage Board and are deemed compliant
with the DC Abstract Model
10
An RDA Application Profile
11
METS Profiles
Description of a class of METS documents
provides document authors and programmers
guidance to create and process conformant METS
documents
XML document using a schema
Expresses the requirements that a METS document
must satisfy
“Data standard” in its own right
A sufficiently explicit METS Profile may be
considered a “data standard”
METS Profiles are output in human-readable prose
and not intended to be “machine actionable” (but they
use a standard XML schema) 12
Components of a METS
Profile
1. Unique URI 7. Extension schemas
2. Short Title 8. Rules of description
3. Abstract 9. Controlled
vocabularies
4. Date and time of
10. Structural
creation requirements
5. Contact Information 11. Technical
6. Related profiles requirements
12. Tools and
applications
13. Sample document
13
Case study of a METS
Profile
LC Audio Compact Disc Profile
Features:
Specifies MODS for descriptive metadata
Specifies description rules as AACR2
Specifies controlled vocabularies used in various
elements
dmdSec requirements 2 and 3 specify use of
relatedItem type=“constituent” if there are multiple
works on the CD
Specifies how to detail the physical structure,
whether multiple CDs or multiple tracks on a CD
(structMap requirements 2 and 3)
14
MODS Profiles
Harvest of American
A common data Memory Objects
Profile-based
format for searching METS
and display Object
21
Example: Using a profile
as an application
METS Photograph Profile
William P. Gottlieb Collection
Portrait of Louis Armstrong
Photographic object
Convert file of 1600 MARC records,
using marc4j, to XML
modsCollection (single file).
Used XSLT stylesheet to create 1600
records conforming to the
METS photograph profile.
22
Logical & Physical Relationships
Logical (MODS) div TYPE=“photo:version” elements
correspond to the 3 nodes using a logical
<mods:mods ID="ver01"> sequence of ID to DMDID relationships
<mods:titleInfo>
<mods:title>Original Work</mods:title> Physical (METS structMap)
</mods:titleInfo>
<mods:relatedItem type="otherVersion" ID="ver02"> <mets:structMap>
<mets:div TYPE="photo:photoObject“
<mods:titleInfo>
DMDID="MODS1">
<mods:title>Derivative Work 1</mods:title>
<mets:div TYPE="photo:version" DMDID="ver01">
</mods:titleInfo>
<mets:div TYPE="photo:image">
</mods:relatedItem>
<mets:fptr FILEID="FN10081"/>
<mods:relatedItem type="otherVersion" ID="ver03"> </mets:div>
<mods:titleInfo> </mets:div>
<mods:title>Derivative Work 2</mods:title> <mets:div TYPE="photo:version" DMDID=“ver02">
</mods:titleInfo> <mets:div TYPE="photo:image">
</mods:relatedItem> <mets:fptr FILEID="FN10090"/>
</mods:mods> </mets:div>
<mets:div TYPE="photo:version" DMDID="ver03">
<mets:div TYPE="photo:image">
mods:mods and <mets:fptr FILEID="FN1009F"/>
</mets:div>
mods:relatedItem type ="otherVersion" </mets:div>
elements create a sequence of 3 nodes </mets:div>
</mets:div>
</mets:structMap>
23
Using a METS profile-based
approach
Ability to model complex library objects
Use of open source software tools
Use of XML for data creation, editing, storage and
searching
Use of XSLT for…
Legacy data conversion
Batch METS creation and editing
Web displays and behaviors
Creation of multiple outputs from XML
HTML/XHTML for Web display; PDF for printing
Ability to aggregate disparate data sources into a
common display
24
Closing thoughts on
application profiles
Many metadata standards are sufficiently
flexible that profiling is necessary
Documenting what is used in an application
will simplify and enhance data presentation,
conversion from other sources, ability to
provide different outputs
Constraining a metadata standard by
specifying what is used and how facilitates
data exchange and general interoperability
25
Exercise: critique an
application profile
University of Maryland Descriptive
Metadata
http://www.lib.umd.edu/dcr/publications/taglibra
ry/umdm.html
UVa DescMeta
http://lib.virginia.edu/digital/metadata/descriptiv
e.html
Texas Digital Library profile for electronic
theses and dissertations
http://www.tdl.org/documents/
ETD_MODS_profile.pdf
26
Exercise: Questions to address
Does the profile define its user community and
expected uses?
How usable would the profile be for a potential
implementer?
How (well) does the profile specify element/term
usage?
How (well) does the profile define and manage
controlled vocabularies?
Does the profile use existing metadata standards?
Are there key anomalies, omissions, or
implementation concerns?
27
6. Controlled vocabularies
Metadata Standards and
Applications Workshop
Goals of Session
2
Why controlled vocabularies?
3
Why bother?
4
Improving Recall and
Precision
Controlled Vocabularies improve recall by
addressing synonyms [attire vs. dress vs.
clothing]
5
Types of Controlled
Vocabularies
Lists of enumerated values
Synonym rings
Taxonomy
Thesaurus
Classification Schemes
Ontology
6
Lists
7
Synonym Rings
Synonym rings are used to expand queries for
content objects
If a user enters any one of these terms as a query to the
system, all items are retrieved that contain any of the terms
in the cluster
Synonym rings are often used in systems where the
underlying content objects are left in their
unstructured natural language format
the control is achieved through the interface by drawing
together similar terms into these clusters
Synonym rings are used in conjunction with search
engines and provide a minimal amount of control of
the diversity of the language found in the texts of the
underlying documents
8
Taxonomies
10
Ontology
11
Thesaural Relationships
Relationship types:
Use/Used For – indicates preferred term
Hierarchy – indicates broader and
narrower terms
Associative – almost unlimited types of
relationships may be used
12
13
Z39.19 Types of Concepts
Birds (things)
Ornithology (discipline)
Feathers (materials)
Flying (activity or process)
Bird counts (event)
Barn Owl (unique entity)
15
Relationships
Equivalence
Hierarchical
Associative
16
Equivalence Relationships
A=B
17
Hierarchical Relationships
A
B
18
Associative Relationships
A B
19
Expressing Relationship
20
Vocabulary Management
23
Encoding Controlled
Vocabularies
MARC 21
Authority Format used for names, subjects,
series
Classification Format used for formal
classification schemes
MADS (a derivative of MARC)
Simple Knowledge Organization System
(SKOS)
Intended for concepts
24
New/Upcoming
Standards:Authorities
Functional Requirements for Authority Data (FRAD)
A new model for authority information
Developed by the IFLA Working Group on Functional
Requirements and Numbering of Authority Records
(FRANAR)
VIAF (Virtual International Authority File)
Prototype at: http://orlabs.oclc.org/viaf/
A Review of the Feasibility of an International
Authority Data Number (ISADN)
Simple Knowledge Organization System (SKOS)—a
W3C standard
25
Functions of the Authority File
Document decisions
Serve as reference tool
Control forms of access points
Support access to bibliographic files
Link bibliographic and authority
files
(Slide from Glenn Patton)
26
FRANAR Concept Model, top
27
FRANAR Concept Model, bottom
28
FRAD person attributes
From FRBR (AACR2 additions to names):
Dates associated with the person
Title of person
Other designation associated with the person
New:
Gender
Place of birth
Place of death
Country
Place of residence
Affiliation
Address
Language of person
Field of activity
Profession/occupation
Biography/history
(Slide from Ed Jones)
29
SKOS
30
SKOS & RDF
31
The skos:Concept class allows you to assert that a
resource is a conceptual resource.
That is, the resource is itself a concept.
32
Preferred and Alternative Lexical
Labels
33
The RDF/XML Encoded
Version
34
Example of ISO 639-2 language code in SKOS
<rdf:Description rdf:about=
"http://www.loc.gov/standards/registry/vocabulary/iso639-2/por">
<rdf:type rdf:resource="http://www.w3.org/2008/05/skos #Concept"/>
<skos:prefLabel xml:lang="x-notation">por</skos:prefLabel>
<skos:altLabel xml:lang="en-latn">Portuguese</skos:altLabel>
<skos:altLabel xml:lang="fr-latn">portugais</skos:altLabel>
<skos:notation rdf:datatype="xs:string">por</skos:notation>
<skos:definition xml:lang="en-latn">This Concept has not yet been
defined.</skos:definition>
<skos:inScheme rdf:resource=
"http://www.loc.gov/standards/registry/vocabulary/iso639-2"/>
<vs:term_status>stable</vs:term_status>
<skos:historyNote rdf:datatype="xs:dateTime">2006-07-
19T08:41:54.000-05:00</skos:historyNote>
<skos:exactMatch rdf:resource=
"http://www.loc.gov/standards/registry/vocabulary/iso 639-1/pt"/>
<skos:changeNote rdf:datatype="xs:dateTime">2008-07-
09T13:49:05.321-04:00</skos:changeNote>
</rdf:Description>
35
Registries: the Big Picture
(Adapted from Wagner & Weibel, “The Dublin Core Metadata Registry:
Requirements, Implementation, and Experience” JoDI, 2005)
36
Why Registries?
Support interoperability
Discovery of available schemes and schemas
for description of resources
Promote reuse of extant schemes and
schemas
Access to machine-readable and human-
readable services
Support for crosswalking and translation
Coping with different metadata schemes
37
Declaration, documentation,
publication
To identify the source of a
vocabulary, e.g., a term comes from
LCSH, as identified in my metadata
by a URI
To clarify a term and its definition
To publish controlled vocabularies
and have access to information
about each term
38
Some uses for registries
Metadata Schemas
Crosswalks between metadata schemas
Controlled Vocabularies
Mappings between vocabularies
Application Profiles
Schema and vocabulary information in
combination
39
Metadata registries
40
Example from Dublin Core Registry—Term Level
41
7. Approaches to Models of
Metadata Creation, Storage and
Retrieval
Metadata Standards
and Applications
Goals of Session
Retrieval/Discovery
2
Creating metadata records
6
Combination Approaches
9
Content with metadata
Examples:
HTML pages with embedded ‘meta’ tags
Library catalogs
Web-based catalogs often provide some
services for digital content
Electronic Resource Management Systems
(ERMS)
Provide metadata records for title level only
Metadata aggregations
Using API or OAI-PMH for harvest and re-
distribution
11
Service only
Library catalogs
Web-based (“Amazoogle”)
Portals and federations
13
Library Catalogs
Based on a consensus that granular
metadata is useful
Expectations of uniformity of information
content and presentation
Designed to optimize recall and precision
Addition of relevance ranking and keyword
searching of limited value (only ‘text’ used
is the metadata itself)
Retrieval options limited by LMS vendor
decisions
14
New Library Catalogs
ENDECA
North Carolina State University Libraries in 2006, was
one of the first to experiment with new catalog
technologies using legacy metadata
15
http://www.lib.ncsu.edu/
Web-based
19
XML based digital library
application
Similar to a portal application
May use a database for record creation and
maintenance
Often uses open source tools
Files are indexed for searching and
presented on the Web using an XML based
publishing framework
Combines some of the other metadata
creation, storage and retrieval approaches
http://www.loc.gov/performingarts/ 20
Information Discovery and
Retrieval
Z39.50
SRU
21
Z39.50
25
Can You Tell?
2
Tools For Sharing
Metadata/Interoperability
Protocols
OpenURL for reference linking
OAI-PMH for harvesting
3
What’s the Point of
Interoperability?
For users, it’s about resource discovery
(user tasks)
What’s out there?
Is it what I need for my task?
Can I use it?
For resource creators, it’s about distribution
and marketing
How can I increase the number of people who
find my resources easily?
How can I justify the funding required to make
these resources available?
4
What’s an OpenURL?
5
Additional Open URL
Services
Link from a record in an abstracting and indexing
database (A&I) to the full-text described by the
record
Link from a record describing a book in a library
catalogue to a description of the same book in an
Internet book shop
Link from a reference in a journal article to a
record matching that reference in an A&I
database
Link from a citation in a journal article to a record
in a library catalogue that shows the library
holdings of the cited journal
6
OpenURL Examples &
Demo
http://sfxserver.uni.edu/sfxmenu?
issn=1234-
5678&date=1998&volume=12&issue=2&s
page=134
An OpenURL demo:
http://www.ukoln.ac.uk/distributed-
systems/openurl/
7
OAI-PMH
Open Archives Initiative Protocol for Metadata
Harvesting (http://www.openarchives.org/)
Roots in the ePrint community, although
applicability is much broader
Mission: “The Open Archives Initiative
develops and promotes interoperability
standards that aim to facilitate the efficient
dissemination of content.”
Content in this context is actually “metadata
about content”
8
OAI-PMH in a Nutshell
9
Metadata
About the
Resource
10
What was OAI-PMH designed
for?
Way to distribute records to other libraries
Low barrier to entry for record providers
Based on
Records must be in XML
No “on-the-fly” sets.
13
14
OAI Best Practices
Activities
Sponsored by Digital Library Federation (DLF)
Guidelines for data providers and service
providers
http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl
Not just DLF, also NSDL
Best Practices for Shareable Metadata
http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?
PublicTOC
Workshops to encourage DLF members to make
records for their digitized content harvestable
Also sponsored by IMLS
15
OAI Example
OAI Example
OAI Example
OAI Example
OAI
Example
20
OAIster
21
http://www.oaister.org/
Crosswalking
24
Crosswalks
26
Example:Mapping MODS:title
to DC:title
Includes attribute for type of title
Abbreviated
Translated
Alternative
Uniform
Other attributes: ID, authority, displayLabel,
xLink
Subelements: title, partName, partNumber,
nonSort
Title definition reused by: Subject, Related
Item
27
Mapping MODS:title to
DC:title
DC has one element refinement:
alternative
DC title has no substructure; MODS
allows for subelements for partNumber,
partName
Best practice statement in DC-Lib says
include initial article; MODS parses into
<nonSort>
MODS can link to a title in an authority
file if desired
28
Metadata Crosswalks
Dublin Core-MARC
Dublin Core-MODS
ONIX-MARC
MODS-MARC
EAD-MARC
EAD-Dublin Core
Etc.
29
Crosswalks
Library of Congress
http://www.loc.gov/marc/marcdocz.html
MIT
http://libraries.mit.edu/guides/subjects/metadata/
mappings.html
Getty
http://www.getty.edu/research/conducting_research/standards/
intrometadata/crosswalks.html
30
MARC to DC Qualified
http://www.loc.gov/marc/marc2dc.html#qualifiedlist
32
NISO’s Metadata Principles
1: Good metadata conforms to
community standards in a way that is
appropriate to the materials in the
collection, users of the collection, and
current and potential future uses of the
collection.
2: Good metadata supports
interoperability.
3: Good metadata uses authority control
and content standards to describe objects
and collocate related objects
33
NISO’s Metadata Principles
Continued
4: Good metadata includes a clear statement of
the conditions and terms of use for the digital
object.
5: Good metadata supports the long-term
curation and preservation of objects in collections.
6: Good metadata records are objects themselves
and therefore should have the qualities of good
objects, including authority, authenticity,
archivability, persistence, and unique
identification.
34
Quality issues
Defining quality
Criteria for assessing quality
Levels of quality
Quality indicators
35
Determining and Ensuring
Quality
What constitutes quality?
Techniques for evaluating and
enforcing consistency and
predictability
Automated metadata creation:
advantages and disadvantages
Metadata maintenance strategies
36
Quality Measurement:
Criteria
Completeness
Accuracy
Provenance
Conformance to expectations
Logical consistency and coherence
Timeliness (Currency and Lag)
Accessibility
37
Basic Quality Levels
38
Quality Indicators: Tier 1
Technically valid
Defined technical schema; automatic
validation
Appropriate namespace declarations
Each element defined within a namespace;
not necessarily machine-resolvable
Administrative wrapper present
Basic provenance (unique identifier, source,
date)
39
Quality Indicators: Tier 2
Controlled vocabularies
Linked to publicly available sources of terms
by unique tokens
Elements defined and documented by a
specific community
Preferably an available application profile
Full complement of general elements
relevant to discovery
Provenance at a more detailed level
Methodology used in creation of metadata?
40
Quality Indicators: Tier 3
41
Improving Metadata
Quality …
Documentation
Basic standards, best practice
guidelines, examples
Exposure and maintenance of local and
community vocabularies
Application Profiles
42
Exercise
43