Pepper2008 (Expressing Dublin Core Intopic Maps)
Pepper2008 (Expressing Dublin Core Intopic Maps)
Steve Pepper
Ontopedia
Oslo, Norway
[email protected]
1 Introduction
The Dublin Core Metadata Initiative (DCMI) is an “open organization engaged in the
development of interoperable online metadata standards that support a broad range of
purposes and business models.” It has developed a number of metadata specifications,
the most important and widely used of which is the Dublin Core Metadata Element
Set [1], which was approved as an ISO standard in 2003 [2].
DCMES and three other vocabularies are documented in [3]. DCMES itself
consists of 15 elements: contributor, coverage, creator, date, description, format,
identifier, language, publisher, relation, rights, source, subject, title, and type. These
are supplemented by 40 additional Elements and element refinements. There is also a
vocabulary of 18 Encoding schemes, one of which (the DCMI Type Vocabulary) is
also a Dublin Core (DC) vocabulary consisting of 12 terms.
In the DCMI abstract model [4], “resources” are “described” by metadata using
“property-value” pairs. ‘Resource’ is defined as “anything that might be identified”
and, although the concept of “being identified” is not defined, it is clear from the
examples that ‘resource’ is essentially the same as ‘subject’ in Topic Maps. The
resources to which Dublin Core metadata is assigned are almost always information
resources, but this need not be the case; they might, for example, be “human beings,
corporations, concepts [or] bound books in a library.”
In terms of the abstract model, the first two vocabularies mentioned above define
properties; the encoding schemes identify controlled vocabularies such as ISO 3166,
TGN and DCMI Box, that provide values for properties such as language, coverage,
etc.; and the DCMI Type Vocabulary defines a set of possible values (such as Text,
Service, Image, etc.) for the property type. Thus, two vocabularies define properties, one
defines schemes that are the source of values, and the fourth consists of actual values.
L. Maicher and L.M. Garshol (Eds.): TMRA 2007, LNAI 4999, pp. 186–197, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Expressing Dublin Core in Topic Maps 187
A number of proposals have been put forward over the years regarding how to use DC
with Topic Maps. This section summarizes and compares them.
Algermissen 2001 [9] was the first published (but unfinished) attempt to use
Dublin Core in a Topic Maps context. It is described as “a processing model for
HTML <meta> tags that make use of the Dublin Core element set” and consists of a
set of PSIs in the form of a topic map modeled on psi1.xtm2 within the framework
of the Topic Maps Processing Model (TMPM4).
The topic map defines TMPM4 association templates that model 11 of the 15 core
DC metadata elements. The element creator is typical: topics represent the resource
and the value, and they are related via an association whose type corresponds to the
metadata element (‘creator’). Association types and role types are assigned PSIs in the
http://www.topicmapping.com/psi/dc/dc.html# namespace; role types also have the
DC identifiers as additional subject identifiers.
The following code shows how a creator property would be expressed:3
dcc:at-resource-creator(
dcc:role-resource : the-document,
dcc:role-creator : the-creator )
Nine other elements are treated in essentially the same way, the exception being
relation, which is modeled as an “untyped relationship between resources”. The
remaining four elements are modeled as follows:
1
Note: The goal of the current work is to standardize the usage of all four vocabularies
documented in [3].
2
psi1.xtm was the set of XTM 1.0 Published Subject Indicators (PSIs) published in
December 2000 by TopicMaps.Org but subsequently withdrawn. It can be retrieved via the
WayBack Machine at web.archive.org as http://www.topicmaps.org/xtm/1.0/psi1.xtm.
3
The notation used for examples is the Compact Topic Maps syntax, as defined in the draft of
2007-09-09 [10]. The syntax is not yet stable and changes are likely.
188 S. Pepper
• title: an untyped base name is assigned to the topic that represents the
resource
• description: an occurrence of type ‘description’ is assigned to a topic that
represents the resource
• subject: an untyped occurrence whose value is the resource is assigned to
a topic that represents the subject
• identifier: [this element is marked as “to be done” with the comment “not
sure if and how to handle this”]
The model requires 23 published subject identifiers in order to cater for 12 of the 15
core elements (title does not require additional PSIs since it is modeled as a base
name, and the treatment of subject and identifier is incomplete).
To summarize: the Algermissen proposal uses names, associations and occurrences
to handle 14 of the 15 core elements; associations are used in two different ways
(typed, as in creator, and untyped, as in relation); occurrences are also used in two
different ways (the resource “owns” the occurrence in the case of description,
whereas the value “owns” it in the case of subject).
Pepper 2003 [11], reproduced in Appendix A, is an example of an approach
pioneered by Ontopia and distributed as the file dc.ltmm with the Omnigator
browser. It can be summarized as follows:
• title is represented as an untyped topic name
• description, date and rights are represented as occurrences
• all other core elements (except identifier) are represented as associations
• there is no example for identifier so its intended use is unclear
• the DC identifiers (e.g. http://purl.org/dc/elements/1.1/creator) are used
as subject identifiers for occurrence types and association types
• PSIs are defined for the concepts ‘resource’ and ‘value’, respectively (in
the namespace http://psi.ontopia.net/metadata/#) for use as role types.
Maicher 2006 [12] represents a similar approach, also published in the form of a topic
map. It has the advantage of covering all four sets of terms documented in [3], but
since it lacks documentation and complete examples, there is no way of knowing
exactly how it is intended to work in practice. The two short examples in the topic
map (showing metadata assigned to the topic map itself, and to the topic representing
the subject “Der Schrei”) include the following mappings:
• associations: language, type, source
• occurrences: description, publisher, references, title
The examples given for creator and created (the latter an “element refinement” of
date) indicate that they can be treated as either occurrences or associations. Based on
this one can assume that the intention is for users to be able to choose whether to use
an occurrence or an association in any given instance.
Maicher follows Pepper in using the DC identifiers as subject identifiers, but also
defines a parallel set (in the http://psi.semports.org/dc/ namespace). He also follows
Pepper in defining two role types to be used across the board whenever a metadata
element is represented as an association; these are called ‘start’ and ‘end’. No
indication is given of how to handle the identifier metadata element.
Expressing Dublin Core in Topic Maps 189
3 Discussion
Assigning metadata to resources is equivalent to making statements about topics. It is
therefore natural to represent the assignment of property-value pairs as statements of
various kinds. In the case of the two DC vocabularies that define metadata elements
(and element refinements) the key issue is to decide whether to represent a given
property as a name, association, or occurrence.4 This section discusses the issues
involved before outlining a proposal for standardization.
A second issue is whether to mandate the use of one (and only one) kind of statement
for any given metadata element, or whether to allow some degree of flexibility,
perhaps coupled with certain admonitions concerning best practice. Algermissen and
Pepper adopt the former approach, and Maicher the latter.
The most obvious example of this question arising is with elements which should
ideally be represented as associations, but where it might sometimes be inconvenient
to represent the value of the property as a topic. Many elements carry comments
4
Handling the two vocabularies that define values and schemes for encoding values is trivial
since terms belonging to these must clearly be mapped to topics.
5
At least using the Topic Maps Data Model. It would be less counter-intuitive at the level of
the Topic Maps Reference Model, which lacks the explicit semantics of the TMDM.
190 S. Pepper
stating that “recommended best practice is to use a controlled vocabulary” (in which
case an association, with the property value represented as a topic, would clearly be
most appropriate). However this advice may sometimes not be followed and the value
of the property in question might simply be entered as a string. In such cases one
might question the utility of creating a topic about which nothing is known except its
name: the situation is scarcely better than using a metadata property whose value is a
string.
In our opinion, the importance of encouraging best practice outweighs this. By
requiring the use of associations in such cases, a point is made about the “right way”
to do things, and this will hopefully lead to the creation of more maximally useful
topic maps. The current proposal is therefore prescribes exactly which kind of
statement to use for each element, and mandates the use of associations in all but
exceptional cases.
Given the fact that associations are recommended for most elements (9 out of 15 in
the basic element set), it becomes necessary to define appropriate role types. While it
would be possible to define a different pair of role types for each association type,
there seems to be no good reason to do so.6 Following the principle of Occam’s
razor,7 the current proposal recommends using the same pair of role types for every
association type. These obviously need to be very generic; following Pepper (2003),
the subjects ‘resource’ and ‘value’ are proposed, since these correspond directly to
Dublin Core terminology:
%prefix dctm http://psi.topicmaps.org/iso29111/
dctm:resource - "Resource" .
dctm:value - "Value" .
As we have seen, both Algermissen (2001) and Maicher (2006) create new PSIs for
every DC element (in addition to reusing the DC identifier), whereas Pepper (2003)
simply reuses the existing identifiers. In the absence of any stated reason for the
duplication it seems sensible to again apply Occam’s razor and avoid creating new
subjects unnecessarily. The current proposal therefore reuses the DC identifiers
without creating new ones (except for the two role types described above).
All previous proposals have been unclear on the handling of the identifier element,
whose DC definition is “an unambiguous reference to the resource within a given
context”. The accompanying comment states that “recommended best practice is to
6
There are some who regard the reuse of role types across association types as bad practice.
They are invited to state their objections.
7
Entia non sunt multiplicanda praeter necessitatem (entities should not be multiplied unneces-
sarily).
Expressing Dublin Core in Topic Maps 191
Based on the preceding discussion of issues raised by earlier work, the approach to
expressing DC metadata in Topic Maps proposed in this paper is as follows:
192 S. Pepper
4.2 DCMES
%prefix dc http://purl.org/dc/elements/1.1/
# name types
title dc:title - "Title"
# occurrence types
date dc:date - "Date"
description dc:description - "Description"
identifier dc:identifier - "Identifier"
rights dc:rights - "Rights"
# association types
contributor dc:contributor - "Contributor"
coverage dc:coverage - "Coverage"
creator dc:creator - "Creator"
format dc:format - "Format"
language dc:language - "Language"
publisher dc:publisher - "Publisher"
relation dc:relation - "Relation"
source dc:source - "Source"
subject dc:subject - "Subject"
type dc:type - "Type"
The term set Other Elements and Element Refinements defines additional properties,
many of which are refinements of the core 15 elements of the DCMES. Element
refinements are represented in the same way as the element that they refine. The
remainder are represented according to the principle given in the fourth bullet of
section 3.6, above:
Expressing Dublin Core in Topic Maps 193
This association represents the dc:subject property whose value (represented by the
topic with the ID Opera) is a term from the Library of Congress Subject Headings
encoding scheme.
The terms in the DCMI Type Vocabulary are represented as topics that play the role
of ‘value’ in associations of type dc:type:
%prefix dctype http://purl.org/dc/dctype/
Collection dctype:Collection - "Collection"
Dataset dctype:Dataset - "Dataset"
Event dctype:Event - "Event"
Image dctype:Image - "Image"
InteractiveResource dctype:InteractiveResource - "Interactive
Resource"
MovingImage dctype:MovingImage - "Moving Image"
PhysicalObject dctype:PhysicalObject - "Physical Object"
Service dctype:Service - "Service"
Software dctype:Software - "Software"
Sound dctype:Sound - "Sound"
StillImage dctype:StillImage - "Still Image"
Text dctype:Text - "Text"
4.6 Example
The following example shows how to apply the preceding declarations in order to
assign metadata to a topic map. We assume they are collected in a separate file called
dc-declarations.ctm which is included in our topic map. The topic map is reified and
statements are made about the reifying topic using the Dublin Core vocabulary:
%include dc-declarations.ctm
%prefix o http://psi.ontopedia.net/
# reify the topic map
~ topicmap
topicmap
- title : "DCMI Metadata Terms"
date : 2007-07-04
description : "A Topic Maps representation of DCMI Metadata Terms"
source : http://dublincore.org/documents/dcmi-terms/
isReplacedBy : "Not applicable”
The difference between these two examples may seem subtle but it can be extremely
significant. In the 2-legs example, the values of the subject, creator and contributor
properties are strings; in the 4-legs example, they are topics. For a single resource
description viewed in isolation, the latter provides little extra value and involves some
overhead. But for a description set consisting of multiple descriptions, the advantages of
the subject-centric approach soon become apparent. The subjects “Subject-centric
computing”, “Dublin Core”, “Steve Pepper” and “Dmitry Bogachev” represented by the
topics in the 4-legs example become additional points of collocation and thus improve
findability. Each of them can potentially be associated with any number of other
subjects in multiple ways and thus new navigation paths become available. And each of
these subjects offers a new potential for linking this topic map with information in other,
related topic maps that may not necessarily represent Dublin Core metadata.
7 Conclusions
This paper has discussed previous work and the issues involved in representing
Dublin Core metadata in Topic Maps. It has proposed a unified approach suitable for
standardization by ISO and pointed out some of the advantages of the subject-centric
approach.
References
[1] Dublin Core Metadata Element Set, Version 1.1. DCMI Recommendation (December 18,
2006), http://www.dublincore.org/documents/dces/
[2] ISO 15836-2003. Information and documentation — The Dublin Core metadata element
set. ISO, Geneva (2003)
[3] DCMI Metadata Terms. DCMI Recommendation (December 18, 2006),
http://www.dublincore.org/documents/dcmi-terms/
[4] Dublin Core Abstract Model. DCMI Recommendation (June 6, 2007),
http://www.dublincore.org/documents/abstract-model/
[5] Expressing Dublin Core in HTML/XHTML meta and link elements. DCMI
Recommendation (November 11, 2003),
http://dublincore.org/documents/dcq-html/
[6] Guidelines for implementing Dublin Core in XML. DCMI Recommendation (April 2,
2003), http://dublincore.org/documents/dc-xml-guidelines/
[7] Expressing Simple Dublin Core in RDF/XML. DCMI Recommendation (July 31, 2002),
http://dublincore.org/documents/dcmes-xml/
[8] Pepper, S.: NP for Technical Report - Information Technology - Topic Maps - Expressing
Dublin Core Metadata using Topic Maps (2006), ISO/IEC JTC 1/SC 34 N0758
http://www.jtc1sc34.org/repository/0758.htm
[9] Algermissen, J.: A Processing Model for HTML using Dublin Core (November 19,
2001), http://www.topicmapping.com/psi/dc/dc.html
[10] Heuer, L., Hopmans, G., Oh, S., Pepper, S.: ISO/IEC CD, 13250-6 Information
Technology - Topic Maps - Compact Syntax (CTM). ISO/IEC JTC 1/SC 34 N0905
(September 9, 2007), http://www.jtc1sc34.org/repository/0905.htm
[11] Pepper, S.: LTM example for Dublin Core metadata. v 1.4 dc-example.ltm (September 9,
2003)
[12] Maicher, L.: Dublin Core Metadata Terms. Topic Map (2006),
http://www.informatik.uni-leipzig.de/~maicher/topicmaps/DCMT.ltm
[13] Pepper, S.: DCMI Metadata Terms Topic Map.,
http://www.ontopedia.net/omnigator/models/topicmap_nontopoly.jsp?tm=DublinCore.ltm
Expressing Dublin Core in Topic Maps 197
Appendix A
/* LTM example for Dublin Core metadata
------------------------------------
$Id: dc-example.ltm,v 1.4 2003/09/18 12:55:46 pepper Exp $ */
#INCLUDE "dc.ltmm"
#TOPICMAP this-tm
[tm-topic = "LTM Dublin Core Example" @"#this-tm"]