An Introduction To Topic Maps: Journal 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

The Architecture Journal: Journal 5

Page 1 of 14

Microsoft.com Home | Site Map

Search Microsoft.com for:

Go

An Introduction to Topic Maps


Kal Ahmed and Graham Moore July 2005 Applies to: Enterprise architecture Topic maps
Journal 5 Best of Journal V1 Journal 4 Journal 3 Journal 2 Journal 1

Summary: This article introduces the ISO international standard Topic Maps. The topic maps paradigm describes a way in which complex relationships between abstract concepts and real-world resources can be described and interchanged using a standard XML syntax. (15 printed pages)

Contents
Introduction Footnotes References

Introduction
This article introduces the ISO international standard Topic Maps. The topic maps paradigm describes a way in which complex relationships between abstract concepts and real-world resources can be described and interchanged using a standard XML syntax. Topic Map History Topic maps were originally developed in the late 1990's as a way to represent back-ofbook index structures so that multiple indexes from different sources could be merged. However, the developers quickly realized that with a little additional generalization, they could create a meta-model with potentially far wider application. The result of that work was published in 1999 as ISO/IEC 13250-Topic Navigation Maps. In addition to describing the basic model of topic maps and the requirements for a topic map processor, the first edition of ISO 13250 included an interchange syntax based on SGML and the hypermedia linking language known as HyTime. The second edition, published in 2002 [1], added an interchange syntax based on XML and XLink. This is the syntax with the widest support in topic map processing products to date, and is the syntax that we will describe in this article. Today there are a number of implementations of the standard, both open-source and proprietary, for a number of languages and platforms including the .NET platform. Topic Map Fundamentals The core of topic maps can be summarized very succinctly: a topic map consists of a collection of topics, each of which represents some concept. Topics are related to each other by associations, which are typed n-ary combinations of topics. A topic may also be related to any number of resources by its occurrences. Figure 1 shows the three fundamentals of topic maps. It also shows how the distinction between topic-to-topic and topic-to-resource relationships enables a partitioning of the model into a topic space that contains only topics and associations between topics and a resource space that contains the resources related to topics. This partitioning is interesting because it allows a topic map developed for one set of resources to be repurposed to index

http://www.microsoft.com/architecture/library.aspx?pid=journal.5&id=msdn.microsoft.com... 3/7/2006

The Architecture Journal: Journal 5

Page 2 of 14

a different set of resources. In this way the topic map can be considered to be a portable form of knowledge. Unlike domain-specific models, the topic map model has no predefined set of types. Instead, individual topic map authors or groups of authors in a community of practice can define the model for their domain of interest and share those models with other authors from other domains.

Figure 1. Topics, associations, and occurrences User Benefits We believe that for many end-users, a good topic maps application will conceal much, if not all, of the topic maps mechanism, allowing users to instead concentrate on the domain model(s) that they work with. However, the topic maps model and the Topic Maps standard do provide a number of benefits that can be surfaced in applications and can be unique selling points.

Simple Organizational Metaphor


The core topic maps metaphor of topics, occurrences, and associations strikes a balance between being compact and easy to understand and providing enough basic infrastructure to allow users to translate their mental model of a domain into a topic map model. Other forms of data and information organization such as RDF and the relational model may have a simpler model still, but then require the user to create infrastructure for common procedures such as labeling an item with some names; defining a class structure or creating n-ary relationships between items.

Domain/Resource Separation
As already described above, the topic maps model has a clear distinction between the domain model, expressed as topics and associations between topics, and the indexed resources, expressed as occurrences that link topics to resources. Three major benefits can be derived from this structure: The topic map can act as a high-level overview of the domain knowledge contained in a set of resources. In this way the topic map can serve not only as a guide to locating resources for the expert, but also as a way for experts to model their knowledge in a structured way. This allows non-experts to grasp the basic concepts and their relationships before diving down into the resources that provide more detail.

http://www.microsoft.com/architecture/library.aspx?pid=journal.5&id=msdn.microsoft.com... 3/7/2006

The Architecture Journal: Journal 5

Page 5 of 14

reference to. For example, a person may have any number of database records about himself or online biographies or pictures, but none of those addressable resources are the personthey are merely some form of descriptor for the person. In the topic map standard, this form of identifier is known as a subject identifier, and the resource that the subject identifier resolves to is known as a subject indicator Topic maps allow the use of URI references to such descriptive resources as a form of identity. Obviously it is important that the topic map author chooses unambiguous descriptive resources for this purpose, and this is an issue that we will return to later. The distinction between the latter two of these forms of identity can be confusing. Consider the URL http://www.networkedplanet.com/about/index.html. This is a Web page that describes the company NetworkedPlanet. So, this URL could be used as the subject identifier for a topic named "The company NetworkedPlanet," because it resolves to a resource which describes the concept of the company. However, if we wanted to talk about the concept "The 'About' page on the website www.networkedplanet.com," we actually want a topic whose subject really is the resource at the address http://www.networkedplanet.com/about/index.html and so we would then use the same URI as a subject locator. The key difference between a subject identifier and a subject locator is that a subject identifier requires human interpretation of a resource to determine the concept that a topic represents, whereas a subject locator simply points to the concept that the topic represents. This is shown in Figure 2. The wide solid arrow shows the use of a resource as a subject locator. The thin solid arrow shows the use of the same resource as a subject identifier. thin dashed arrows show the role of the human being in the interpretation of a subject identifier. Although a single topic can have many forms of identity, it is important to note that each separate identifier can resolve to only one topic. The merging rules of topic maps (described later) enforces this one-to-many relationship between topics and their identifiers. In addition to these forms of identity, a topic can also have any number of types and any number of names. The types of a topic define the class (or classes) of concept that the concept represented by the topic belongs to. Types are treated in topic maps as concepts in their own right; hence every type is represented by a topic. The type of a topic is specified simply by a privileged form of relationship between the topic that represents the instance and the topic that represents the type. The names of a topic define a set of labels for a topic. Every name has a hierarchical structure. At the root is the base name, which has a string representation. It is the base name string value that is used to determine topic identity by label. A base name is also a container for any number of alternate forms (known as variant names). The alternate forms of a name may be either string values or references to resources; allowing representations such as icons or sound clips to be referenced as variant names. Base names and variant names can be given a context (or scope) in which they are valid, allowing a topic mapaware application to select the best name for presentation to a user in a given situation. We will cover scope later.

Associations
Associations are the general form for the representation of relationships between topics in a topic map. An association can be thought of as an n-ary aggregate of topics. That is, an association is a grouping of topics with no implied direction or order, and there is no restriction on the number of topics that can be grouped together. An association can be assigned a type (again defined by a topic) that specifies the nature of the relationship represented by the association. In addition, each topic that participates in the association plays a typed role that specifies the way in which the topic participates. For example to describe the relationship between a person, "John Smith," and the company

http://www.microsoft.com/architecture/library.aspx?pid=journal.5&id=msdn.microsoft.com... 3/7/2006

The Architecture Journal: Journal 5

Page 6 of 14

he works for, "ABC Limited," we would create an association typed by the topic "Employment" and with role types "Employee" (for the role played by "John Smith") and "Employer" (for the role played by "ABC Limited"). Like names, an association can be assigned a scope in which it is valid, and which may be used by a topic map-aware application to determine whether or not to display the information represented by the association to a user in a given situation.

Occurrences
Occurrences are used to represent or refer to information about a concept represented by a topic. Occurrences can be used either to store string data within the topic map, or to reference any kind of Web-addressable resource external to the topic map. No restriction placed on what type of resource is addressed by an occurrence. It may be a static HTML page, an HTML page generated by ASP, a Web service or any other type of resource. Neither are occurrences restricted to the HTTP protocolany address encoded as a URI can be used to address an external resource. Once again, occurrences can be typed, using a topic to express the occurrence type, and a scope of validity can also be assigned to an occurrence.

Scope
Scope is the term used in the topic map standard to refer to a constraint or a context in which something is said about a topic. The way in which such statements about topics are made is by adding a name to the topic; specifying an occurrence for a topic; or creating an association between topics (in which case the statement applies to all of the topics in the association). In many cases statements are not always true, but are dependent upon a context. For example we make statements such as "ABC Limited was top vendor of widgets in Q2 2004," or "Fred says that ABC Limited is a good investment." In these statements the context is shown in italicsa temporal context in the first case and a quotation context in the second case. More prosaically, context is often used to facilitate multi-lingual interfaces, so the concept "Dog" may have the label "dog" in the context of the English language, "le chien" in French, and "das Hund" in German. In a topic map, scope is defined by a collection of topics that can be assigned to a name, an occurrence, or an association. The default scope (where no set is assigned) is known as the unconstrained scope and simply means that the name, occurrence, or association is always valid. When a topic map-aware application encounters a name, occurrence, or association that has a scope assigned to it, the application should make use of information it has about the current operating context and compare that information against the scope information contained in the topic map to determine if the construct is valid and whether or not it should be presented to the user. In the current edition of ISO 13250, the mechanics for processing scope against an application context are not constrained by the standard, and for many topic map developers this is seen as a shortcoming as it can make it more difficult to exchange topic maps that use scope. The next revision to the standard will recommend that a scope that consists of multiple topics should be processed such that the scoped construct is valid only if the application determines that all of the topics in the scope apply to the current application context.

http://www.microsoft.com/architecture/library.aspx?pid=journal.5&id=msdn.microsoft.com... 3/7/2006

The Architecture Journal: Journal 5

Page 7 of 14

Figure 3. The Structure of an Association

Topic Merging
Automatic topic merging is a key feature of topic maps and one that brings many benefits to topic map development and to applications that make use of topic maps for managing and exchanging data. The principle behind topic merging is that in any given topic map, each subject described by the topic map must be represented by one and only one topic in the topic map. This means that it is the responsibility of the topic map processor to attempt to identify the situation in which two topics represent the same subject and to process them so that only one topic remains. This is the process of merging. Identifying when two topics represent the same subject is achieved by applying heuristics. The topic maps standard defines a set of basic heuristics: 1. 2. 3. If two topics share the same source locator, then they have been parsed from the same topic map source and must be considered to represent the same concept. If two topics have the same subject locator, then they both identify the same network resource as being the thing that they represent. If two topics have the same subject indicator, then they are both using the same resource to describe the concept that they represent and must be considered to represent the same concept. If two topics each have a base name with the same string representation and the scope of the base names are the same set of topics, then the topics must be considered to represent the same concept. Finally, a topic map application may make use of any domain-specific information it has to determine that two topics represent the same concept.

4.

5.

Item (3) in the list above raises the importance of selecting a good resource as the description for a concept. If the description is somehow ambiguous or if the resource addressed is not well-defined enough, it is possible that two different topic map authors might use the same resource as a descriptor for different concepts, leading to undesired merging. In our experience, good resources for subject descriptors are ones created specifically to describe a single subjectthe pages at wikipedia.org, for example, or pages created by the topic map author(s) or by a community of practitioners to define a controlled vocabulary.

http://www.microsoft.com/architecture/library.aspx?pid=journal.5&id=msdn.microsoft.com... 3/7/2006

Figure 4.Topic Merging Item (4) has proven to be controversial in the topic map community as it relies on what many consider to be a relatively weak form of identitythe name for a concept in some language. The mapping of words in a language to concepts is a complex affair and one has challenges in multiple words having different meanings (homonyms), not to mention localization challenges! In the next version of the ISO standard, the restrictions on name based identity will be tightened still further to require an author to explicitly flag a topic name as being one that should be used to confer an identity (the default being that a name shall not confer identity to its topic). Item (5) allows for applications to extend the Topic Maps standard's set of merging criteria with application-specific criteria. These could include criteria based on more than a straight forward string or URI comparison. For example an application might know that "The Duke" and "John Wayne" are names for the same actor and merge two topics on that basis. Having identified the topics to be merged, the merging process defines the process of replacing those two (or more) topics with a single topic. The single topic that results from the merging process has all of the identifiers, names (including variant names), and occurrences of the topics that are merged. In addition, the result topic replaces the merged topics wherever they are referenced (that is, in any associations, scopes, or types that they

The Architecture Journal: Journal 5

Page 9 of 14

<baseNameString>Person</baseNameString> </baseName> </topic> <!-Similarly for membership, group, singer and guitarist --> <!-- The Clash is a Band --> <topic id="clash"> <instanceOf> <topicRef xlink:HREF="#band" TARGET="_self"/> </instanceOf> <baseName> <baseNameString>The Clash</baseNameString> </baseName> </topic> <!-- Joe Strummer is a Person (note multiple names) --> <topic id="joe-strummer"> <instanceOf> <topicRef xlink:HREF="#person" TARGET="_self"/> </instanceOf> <baseName> <scope> <topicRef xlink:href="stage-name"/> </scope>

<baseNameString>Joe Strummer</baseNameString> </baseName> <baseName> <baseNameString>Joseph Mellor</baseNameString> </baseName> </topic> <!Joe Strummer is a member of The Clash --> Note separate member elements used for the different roles played --> <association> <instanceOf> <topicRef xlink:HREF="#membership" TARGET="_self"/> </instanceOf> <member> <roleSpec> <topicRef xlink:HREF="#group" TARGET="_self"/> </roleSpec> <topicRef xlink:HREF="#clash" TARGET="_self"/> </member> <member> <roleSpec> <topicRef xlink:HREF="#singer" TARGET="_self"/> </roleSpec> <topicRef xlink:HREF="#joe-strummer" TARGET="_self"/> </member> <member> <roleSpec> <topicRef xlink:HREF="#guitarist" TARGET="_self"/>

http://www.microsoft.com/architecture/library.aspx?pid=journal.5&id=msdn.microsoft.com... 3/7/2006

</roleSpec> <topicRef xlink:HREF="#joe-strummer" TARGET="_self"/> </member> </association> </topicMap>

We will not go into the details of the syntax here. The interested reader is referred to the original XML Topic Maps specification [2] produced by TopicMaps.org (which was subsequently adopted by ISO). It should be noted that the XTM syntax does not impose the merging restrictions that are required of a topic map processor. This allows XTM to be created easily, but requires that any processor that reads an XTM file must detect topics that must be merged and apply merging rules as the XTM file is parsed. When an XTM file is known to be "fully merged" (that is, it does not contain topic elements representing topics that should be merged), the topic map model that it contains can be easily accessed using standard XML processing tools such as XSLT and XQuery. However, it is not the case that standard XML processing tools can be easily applied to XTM files where merging is required. Despite the issues with merging, the XTM syntax serves the basic need of allowing interchange between conformant topic map processing applications. In addition, the syntax and merging rules together are sufficiently flexible to even allow parts of a topic map to be serialized as separate XTM documents and later recombined through merging [3]. Topic Map Patterns As we have hopefully demonstrated up to this point, the Topic Maps standard provides a very flexible base architecture for a wide variety of information and knowledge management applications. This flexibility can lead to confusion and constant reinvention of basic modeling approaches. To address this issue, we advocate the development and use of patterns within topic map applications. We divide patterns into two broad categories: Topic Map Design Patterns that are patterns for modeling topic map data; and Topic Map Application Patterns that are architectural patterns for the use of topic map processing systems.

Topic Map Design Patterns


The basic concept of a Topic Map Design Pattern borrows heavily from design patterns in software engineering. A Topic Map Design Pattern provides a focused and reusable ontology that addresses a single issue. There are a couple of interesting differences, however. A Topic Map Design Pattern can be more prescriptive than a software design pattern, as it should specify the subject

The Architecture Journal: Journal 5

Page 11 of 14

"metaontology" rather than modified. This pattern enables an application to process a set of associations between topics as representing a hierarchy. For example, it may display the topics arranged into a tree view.

Topic Map Application Patterns


Topic Map Application Patterns provide high-level architectural patterns and principally concentrate on the integration of a topic map processing system with other data systems and applications. These patterns include patterns for representing information from external data systems as topic map data; patterns for the import of information from external data systems; and patterns for the export and display of topic map data. We will discuss topic map applications in more detail in a following paper. Future Developments At the time of writing more work is being done within ISO both on the Topic Maps standard itself and on a suite of companion standards. Although ISO/IEC 13250 has been through a revision, the core of the standard has remained unchanged since 1999a fair degree of stability in comparison to many Internet standards. However, the ISO committee has decided that the next version of the standard will be a significant overhaul in the way the standard is presented and a minor overhaul of the standard itself.

http://www.microsoft.com/architecture/library.aspx?pid=journal.5&id=msdn.microsoft.com... 3/7/2006

The Architecture Journal: Journal 5

Page 12 of 14

Figure 5. Hierarchical Classification Pattern The ISO/IEC 13250 standard is to be divided into a number of separate parts: a nonnormative introduction; a formal description of the underlying data model of topic maps; an XML/XLink-based interchange syntax with a description of the process of deserializing the syntax into an instance of the data model and serializing the data model into a document

http://www.microsoft.com/architecture/library.aspx?pid=journal.5&id=msdn.microsoft.com... 3/7/2006

The Architecture Journal: Journal 5

Page 13 of 14

conforming to the interchange syntax; and a canonicalization algorithm for the data model that can be used in topic map processor conformance testing. It is hoped that this organization will make the standard more reader friendly and will add features that were originally missing and were felt to be important for future developments (specifically the formal model specification and the canonicalization algorithm). Changes to the standard include the ability to apply data-types to occurrence values, including the ability to embed XML; the ability to declare a subset of the names of a topic as names to be used for determining topic identity; a clearer model of scope; and a definition of the interchange syntax in W3C XML Schema and Relax-NG, as well as XML DTD. In addition to the changes to ISO/IEC 13250, the committee has also commenced work on two companion standards. ISO/IEC 18408: Topic Maps Query Language (TMQL) will define a language for querying the topic map data model, allowing the selection of both topic map constructs (such as topics and associations) and of the data carried by them (topic name or occurrence values, for example). ISO/IEC 19756: Topic Maps Constraint Language (TMCL) defines a schema language for topic maps that would allow the schema author to constrain the constructs that can appear in a topic map and how they must relate to one another. As with XML a schema language for topic maps enables both validation and also smarter, schema-driven editing applications. Both of these standards are currently in an early stage of development with requirements defined and, in the case of TMQL, an initial proposal for the language has been created. Work on the core standard and on the query and constraint languages can be followed on the ISO Topic Maps website [5]. Summary This article introduced the topic maps paradigm in the context of the ISO standard. We presented the principal components of the topic map model, showing how the standard processing components of scope and topic merging give additional power to this model. In a forthcoming article we will present some concrete use-cases for topic maps and show how the topic map model can be used to address many of the information organization and interchange needs of a modern business environment.

Footnotes
1. The word "ontology" in this context means the system of types of topics, occurrences, and associations that together define the classes of things and relationships between things that are documented by a topic map.

References
1. 2. 3. 4. 5. Biezunski M., Newcomb S., Pepper S. (ed.). ISO/IEC 13250:2002, Topic Maps [online]. ISO. PDF format. Moore G., Pepper S. (ed.), XML Topic Maps (XTM) 1.0 [online], TopicMaps.Org. HTML format. Ahmed K., TMShareTopic Map Fragment Exchange in a Peer-To-Peer Application HTML format. Ahmed K., Topic Map Design Patterns for Information Architecture. HTML format. http://www.isotopicmaps.org/

About the authors


Kal Ahmed Kal has worked in SGML and XML information management for 10 years working in both software development and consultancy. He is well-known in the topic map community for

http://www.microsoft.com/architecture/library.aspx?pid=journal.5&id=msdn.microsoft.com... 3/7/2006

The Architecture Journal: Journal 5

Page 14 of 14

his work on the open source Java topic map toolkit, TM4J and for his contributions to development of the ISO standard. Kal has published many articles on topic maps and topic map-related themes and is a frequent conference speaker. Kal is now co-founder of Networked Planet Limited, a company developing topic map tools and topic maps-based applications for the .NET platform. Graham Moore Graham has worked for eight years in the areas of information, content and knowledge management as a developer, researcher and consultant. He has held leading roles as CTO of STEP, Vice President Research & Development empolis GmbH and Chief Scientist Ontopia AS. He has been responsible for the development of knowledge management products including K42 Topic Map Engine, X2X Link Management Engine and e:kms knowledge suite. Graham is co-editor of the XTM 1.0 XML Topic Maps standard and ISO13250-1 and -2 (Topic Map Data Model and Syntax), he is also co-editor of TMCL (Topic Map Constraint Language). Graham is currently co-founder of Networked Planet Limited. 2006 Microsoft Corporation. All rights reserved. Email page Manage Your Profile 2006 Microsoft Corporation. All rights reserved. Terms of Use | Trademarks | Privacy Statement

Add to favorites

Get Site RSS feed

http://www.microsoft.com/architecture/library.aspx?pid=journal.5&id=msdn.microsoft.com... 3/7/2006

You might also like