Lecture Notes in Artificial Intelligence 4303
Lecture Notes in Artificial Intelligence 4303
Lecture Notes in Artificial Intelligence 4303
Advances in
Knowledge Acquisition
and Management
13
Series Editors
Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA
Jörg Siekmann, University of Saarland, Saarbrücken, Germany
Volume Editors
Achim Hoffmann
The University of New South Wales, School of Computer Science & Engineering
Sydney 2052, Australia
E-mail: [email protected]
Byeong-ho Kang
University of Tasmania, School of Computing
Hobart Campus, Centenary Building, Hobart, TAS 7001, Australia
E-mail: [email protected]
Debbie Richards
Macquarie University, Department of Computing
Sydney, NSW 2109, Australia
E-mail: [email protected]
Shusaku Tsumoto
Shimane University, School of Medicine, Department of Medical Informatics
89-1 Enya-cho, Izumo 693-8501, Japan
E-mail: [email protected]
CR Subject Classification (1998): I.2.6, I.2, H.2.8, H.3-5, F.2.2, C.2.4, K.3
ISSN 0302-9743
ISBN-10 3-540-68955-9 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-68955-3 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2006
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 11961239 06/3142 543210
Preface
Since knowledge was recognized as a crucial part of intelligent systems in the 1970s
and early 1980s, the problem of the systematic and efficient acquisition of knowledge
was an important research problem. In the early days of expert systems, the focus of
knowledge acquisition was to design a suitable knowledge base for the problem do-
main by eliciting the knowledge from available experts before the system was com-
pleted and deployed. Over the years, alternative approaches were developed, such as
incremental approaches which would build a provisional knowledge base initially and
would improve the knowledge base while the system was used in practice. Other
approaches sought to build knowledge bases fully automatically by employing
machine-learning methods. In recent years, a significant interest developed regarding
the problem of constructing ontologies. Of particular interest have been ontologies
that could be re-used in a number of ways and could possibly be shared across differ-
ent users as well as domains.
The Pacific Knowledge Acquisition Workshops (PKAW) have a long tradition in
providing a forum for researchers to exchange the latest ideas on the topic. Partici-
pants come from all over the world but with a focus on the Pacific Rim region.
PKAW is one of three international knowledge acquisition workshop series held in
the Pacific-Rim, Canada and Europe over the last two decades. The previous Pacific
Knowledge Acquisition Workshop, PKAW 2004, had a strong emphasis on incre-
mental knowledge acquisition, machine learning, neural networks and data mining.
This volume contains the post-proceedings of the Pacific Knowledge Acquisition
Workshop 2006 (PKAW 2006) held in Guilin, China. The workshop received 81
submissions from 12 countries. All papers were refereed in full length by the mem-
bers of the International Program Committee. A very rigorous selection process re-
sulted in the acceptance of only 21 long papers (26%) and 6 short papers (7.5%).
Revised versions of these papers which took the discussions at the workshop into
account are included in this post-workshop volume. The selected papers show how
the latest international research made progress in the above-mentioned aspects of
knowledge acquisition. A number of papers also demonstrate practical applications of
developed techniques.
The success of a workshop depends on the support of all the people involved.
Therefore, the workshop Co-chairs would like to thank all the people who contributed
to the success of PKAW 2006. First of all, we would like to take this opportunity to
thank authors and participants. We wish to thank the Program Committee members
who reviewed the papers and the volunteer student Yangsok Kim at The University of
Tasmania for the administration of the workshop.
Honorary Chairs
Paul Compton (University of New South Wales, Australia)
Hiroshi Motoda (Osaka University, Japan)
Workshop Co-chairs
Achim Hoffmann (University of New South Wales, Australia)
Byeong-ho Kang (Tasmania University, Australia)
Debbie Richards (Macquarie University, Australia)
Shusaku Tsumoto (Shimane University, Japan)
Program Committee
George Macleod Coghill (University of Aberdeen, UK)
Rob Colomb (University of Queensland, Australia)
John Debenham (University of Technology, Sydney, Australia)
Rose Dieng (INRIA, France)
Fabrice Guillet (L'Université de Nantes, France)
Udo Hahn (Freiburg University, Germany)
Ray Hashemi (Armstrong Atlantic State University, USA)
Noriaki Izumi (Cyber Assist Research Center, AIST, Japan)
Yasuhiko Kitamura (Kwansei Gakuin University, Japan)
Mihye Kim (Catholic University of Daegu, Korea)
Rob Kremer (University of Calgary, Canada)
Huan Liu (Arizona State University, USA)
Ashesh Jayantbhai Mahidadia (University of New South Wales, Australia)
Stephen MacDonell (Auckland Universtiy of Techonology, New Zealand)
Rodrigo Martinez-Bejar (University of Murcia, Spain)
Tim Menzies (NASA, USA)
Kyongho Min (Auckland University of Technology, New Zealand)
Toshiro Minami (Kyushu Institute of Information Sciences and Kyushu Uni-
versity, Japan)
Masayuki Numao (Osaka University, Japan)
Takashi Okada (Kwansei Gakuin University, Japan)
Frank Puppe (University of Wuerzburg, Germany)
Ulrich Reimer (Business Operation Systems, Switzerland)
Debbie Richards (Macquarie University, Australia)
Masashi Shimbo (Nara Institute of Science and Technology, Japan)
Hendra Suryanto (University of New South Wales, Australia)
Takao Terano (University of Tsukuba, Japan)
Peter Vamplew (University of Ballarat, Australia)
Takashi Washio (Osaka University, Japan)
Ray Williams (University of Tasmania, Australia)
Shuxiang Xu (University of Tasmania, Australia)
Seiji Yamada (National Institute of Informatics, Japan)
Table of Contents
Abstract. The wide use of the Internet and the increasingly improvement of
communication technologies have led users to need to manage multimedia
information. In particular, there is an ample consensus about the necessity of
new computational systems capable of processing images and “understand”
what they contain. Such systems would ideally allow to retrieve multimedia
content, to improve the way of storing it or to process the images to get some
information interesting for the user. This paper presents a methodology for
semi-automatically extracting knowledge from 2D still visual multimedia
content, that is, images. The knowledge is acquired through the combination of
several approaches: computer vision (to get and to analyse low level features),
qualitative spatial analysis (to obtain high level information from low level
features), ontologies (to represent knowledge), and MPEG-7 (to describe the
information in a standard-way and make the system capable of performing
queries and retrieve multimedia content).
1 Introduction
An incommensurable amount of visual information is becoming available in digital
form, in digital archives, on the World Wide Web, in broadcast data streams and in
personal and professional databases, and this kind of information is increasing.
Nowadays, it is common to have access to powerful computers capable of
executing complex processes and despite that, there is no efficient approach to
process multimedia content to extract high-level features (as knowledge) from them.
Moreover, a lot of processes use multimedia contents as their primary data source in
critical domains.
It is clear new computational systems capable of processing and “understanding”
multimedia content are needed. So, different processes can be performed more efficiently:
multimedia content retrieval, storage and processing. In this way, images can be processed
to get interesting information for the user, who is not interested in low-level features of
multimedia information but in high-level ones (i.e., the content meaning). This is the so-
called semantic gap: how to bridge the low-level features and high-level features. It refers
to the cognitive distance between the analysis results delivered by state-of-art image-
analysis tools and the concepts human look for in images [4].
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 1 – 12, 2006.
© Springer-Verlag Berlin Heidelberg 2006
2 P.J. Vivancos-Vicente et al.
Traditionally, textual features such as filenames, captions, and keywords have been
used to annotate and retrieve images [7]. Research on intelligent systems for
extracting knowledge or meta-information directly from multimedia content has
increased in the last years. For example, systems which usually work with sport
videos, recognising some kinds of events as a function of audio comments [12,13,14].
But this is not enough to get meta information about the image content. Many
content-based image retrieval systems have been proposed in literature [1,2,3]. Most
of them try to get more information by analysing the image to work out low-level
features such as colours, textures, and shapes of objects, but this is not sufficient to
get real information about what an image contents [11].
In this work, an approach to obtain high-level features from images using
ontologies and qualitative spatial representation and reasoning is presented. This
approach extracts relationships between the regions of the image by using their low-
level features obtained in the segmentation step. Then, it creates a content
representation where the regions are concepts, the low-level features their attributes
and the relationships are inferred knowledge. This information is then used to
compare this structure to ontologies stored in libraries so that the system can guess
what each region really is and perhaps, what the image represents. An advantage of
using semantic approaches is the fact that they do not require to re-design the
framework for different domains. It provides a new layer that is completely
independent of the methods and techniques used to process the image.
Finally, the structure of this paper is the following. In section 2, the technical
background of this methodology is discussed. An overview of the methodology
proposed for this work is described in section 3. Section 4 describes the processes for
extracting high-level information from images. An example of the methodology is
shown in section 5. Finally, some conclusions are put forward in section 6.
2 Technical Background
Along this section the basic methodological components of our approach are briefly
explained.
which the system must try to find out their real meaning. Once segmentation has been
performed, a set of low-level features are obtained for each region.
2.3 Ontologies
2.4 MPEG-7
The goal of the framework is to get high level information from an input image.
This process may be supported by an expert during the image processing task to
extract the image segments.
At the beginning, some filters and techniques are applied to the input image to
determine the segments. This step may be done by an expert if the content of the
image is not previously known. This may be done (semi) automatically if the system
has analysed similar images (the same domain) before, so it knows which algorithms
and techniques to apply. After that, the system knows every segment that composes
the whole image; so, it is possible to extract low level features for each segment and
for the whole image in general (e.g. the background dominant colour). Once
Visual Knowledge Annotation and Management 5
segmentation has been performed, a set of low-level features are obtained for each
segment. This process will be explained in the next section.
Another sub-system will be able to get qualitative spatial information between
segments so a structure of topological relationships between concepts will be obtained
(e.g. “A is left of B”, “C is similar size to D”…)
Once the structure is obtained, it can be compared with all the ontologies with
topological information stored in our library subsystem to match the structure with an
ontology. In this case, the system is able to interpret the image in the context of a
particular domain.
As it has been abovementioned, the input is an image. The system processes each
image to obtain the elements that appear in the image, using several techniques based
on segmentation, which is the process that partitions the spatial domain of an image or
other raster datasets like digital elevation models into mutually exclusive parts, called
regions. After that, the system gets several quantitative features [6] for each element
found. Theses features are enumerated in the following list:
Features Description
Position It is defined as the portion of space that is occupied by the object.
Orientation Where the object is pointing to.
Location refers to the location of the object in the image (e.g. far north)
Size The area of the segment.
Compactness Represents the density of an object.
Dimension It is composed by two properties: width and height.
Perimeter It represents the distance around a figure.
Shape It represents the visual appearance of a region.
Colour The visual attribute of the region that results from the light they
emit or transmit or reflect.
Texture The tactile quality of a surface or the representation or invention of
the appearance of such a surface quality.
Once all this information has been obtained, it may be represented in MPEG-7
because it has descriptors to represent general information about the image and for each
region. Some of them are to define basic structures like colour, texture, shape,
localization and others for another kind of multimedia contents such us video or sounds.
6 P.J. Vivancos-Vicente et al.
Qualitative representation has already been used in computer vision for visual object
recognition at a higher level, which includes the interpretation and integration of
visual information. The use of qualitative spatial information helps to ensure that
semantically close scenes have highly similar descriptions. Hence, it is possible to
recognise images that represent the same content. Our approach uses ontologies to
define topologically a scenario. So, the system can compare the information obtained
from the image with the ontologies in order to infer the content of an image. So, in
this way, the system is able to interpret the results of low-level computations as higher
level scene descriptions.
In order to achieve our objective, the system must find out all the spatial
relationships existing between all the regions detected in the image through the
previous phase. These relationships will give us information about how the regions
are spatially ‘related’, that is, how the scenario is configured. The result of this
process is a ‘graph’ where the nodes or concepts (regions) are related to each other by
using different kinds of spatial relationships. The spatial relationships the system can
work with are explained below.
5 Example
Let us illustrate how the methodology works through a very simple example. Let us
suppose that the system must analyse the following image of a head. It should be
noted that the system does not know what it is.
8 P.J. Vivancos-Vicente et al.
In this case, a human being can easily see a mouth, two eyes, a nose and two ears, that
is, a head. Let us see step by step how the system might come to the same conclusion.
As it has been described before, the segmentation process is a difficult task. It usually
needs the support of an expert (at least once for each kind of image/domain) to get all
the regions. In this example, the image to process is already segmented (notice that
each region is of a colour different from the around regions). So, the result of the
segmentation is shown in figure 3:
Each region has been labeled so a human being would say that A and B are ears, C
is the whole face, D is the mouth, E is the nose, and F and G are the eyes.
For each region, the system obtains all the (qualitative or quantitative) attributes
we mentioned before: position, size, etc so that the next step can be performed.
Visual Knowledge Annotation and Management 9
Once the segmentation task has been performed, the system uses the information
obtained in the previous step to get qualitative spatial relationships between all the
regions of the image. In our example, some of the relationships the system may get
are shown in the following table:
Step 3: Comparing the structure obtained with the ontologies in the library
Once the system has found all the regions and the relationships between them, it will
be capable of comparing the structure obtained with the ontologies of the library in
order to determine whether the image represents something described in an ontology.
In our knowledge base, a head is described by using the ontology shown below.
Notice that some relationships have been omitted, such as “right of”, because we
have considered “left of” (corresponding to the symmetric relationship of “right of”).
Our system will be capable of comparing the ontology and the spatial information
obtained to guess that the image contains a head, and what it was labeled as A, B,
C,… are LEFT EAR, RIGHT EAR, FACE, and so on, respectively. The matching is
based on the ontology structural axioms. Each concept has a list of structural axioms
(e.g., “A is a concept”, “A is related to B”,…) and each rule obtained is a potential
axiom. If a subset of rules characteristing an object in the image accomplish with all
the axioms of a concept, the system infers this object is an instance of such a concept.
LEFT(A,B) LEFT(LEFT_EAR,RIGHT_EAR)
LEFT(A,C) LEFT(LEFT_EAR,FACE)
LEFT(A,D) LEFT(LEFT_EAR,MOUTH)
LEFT(A,E) LEFT(LEFT_EAR,NOSE)
LEFT(A,F) LEFT(LEFT_EAR,LEFT_EYE)
⇒ A is a LEFT_EAR
LEFT(A,G) LEFT(LEFT_EAR,RIGHT_EYE)
SIM_SIZE(A,B) SIM_SIZE(LEFT_EAR, RIGHT_EAR)
ABOVE(A,D) ABOVE(LEFT_EAR, MOUTH)
SIM_SHAPE(A,B) SIM_SHAPE(LEFT_EAR, RIGHT_EAR)
…. ….
segmentation process. After that, a knowledge engineer must define an ontology for
the domain under question with topological information, so that the system will be
able to detect breast lumps and even say if it is malign or not (by using the
information in the ontology).
We are currently developing a software system which will implement this
methodology and will be used in a medical domain to detect some kind of tumors
semi automatically. To be more precise, the system will detect automatically every
part of the body from the image and infer if there is something wrong, detecting
quickly a possible tumor. This high-level information will be stored in MPEG-7
format in order to make the information available easily for the hospital staff.
Acknowledgements
This work has been possible thanks to the Spanish Ministry for Science and Education
through the projects TSI2004-06475-C02; the Seneca Foundation through the Project
00670/PI/04; FUNDESOCO through project FDS-2004-001-01; the Autonomous
Community of Murcia Region through project 2I05SU0013; the European
Commission ALFA through project FA-0447.
References
[1] Flickner, M., Shawney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Query by image
and video content: the QBIC system. IEEE Computer, 28(9), 23-32. 1995
[2] Jain, A.K., Vailaya, K. Image retrieval using color and shape. Pattern recognition, 29,
1233-1244. 1996
[3] Rui, Y., Huang, T.S., Chang, S.F. Image retrieval: current techniques, promising
directions, and open issues. Journal of Visual Communications and Image representation,
10, 39-62. 1999
[4] Hollink, L., Nguyen, G., Schreiber, G., Wielemaker, J., Wielinga, B., Worring, M.
Adding spatial semantics to image annotations. Workshop of Language and Semantic
Technologies to support Knowledge Management Processes. EKAW 2004. 2004
[5] Cohn, A.G., Hazarika, S.M. Qualitative Spatial Representation and Reasoning: An
Overview. Fundamenta Informaticae 43, pag. 2-32. 2001
[6] Galton, A., Qualitative Spatial Change. Oxford University Press, Inc., New York. 2000
[7] Srihari, R.K., Zhang, Z. Show&Tell: A semitautomated image annotation system. IEEE
Multimedia. July-September 2000, 61-71. 2000
[8] MPEG Requirements Group. “MPEG-7 Overviwe", Doc. ISO/MPEG N2727, MPEG
Palma de Mallorca Meeting, October 2004.
[9] Skiadopoulos, S., Koubarakis, M., Composing Cardinal Directions Relations. Artificial
Inteligence vol. 152 (143-171). 2004
[10] Hollink, L., Nguyen, G., Schreiber, G., Wielemaker, J., Wielinga, B., Worring, M.
Adding spatial semantics to image annotations. Workshop of Language and Semantic
Technologies to support Knowledge Management Processes. EKAW 2004. 2004
[11] Antani, S., Lee, D.J., Rodney-Long, L., Thoma, G.R., Evaluation of shape similarity
measurement methods for spine X-ray images. Visual Communication & Image
Representation, vol. 15 (285-302). 2004
12 P.J. Vivancos-Vicente et al.
[12] Denman, H., Rea, N., Kokaram, A. Content Based Analysis for Video from Snooker
Broadcasts. Lecture Notes in Computer Science. Volume 2383. Ed. Springer. Berlin.
2002
[13] Assfalg, J., Bertini, M., Colombo, C., Del-Bimbo, A., Nunziati, W. Semantic annotation
of soccer videos: automatic highlights identification. Computer Vision and Image
Understanding. 2003
[14] Andrade, E.L., Woods, J.C., Khan, E., Ghanbari, M. Region-based analysis and retrieval
for tracking of semantic objects and provision of augmented information in interactive
sport scenes. IEEE Transactions on Multimedia. Vol. 7, Issue 6. pp1084-1096. 2005
[15] G. Van Heijst, A. T. Schreiber, & B. J. Wielinga, ‘Using explicit ontologies in KBS
development’. International Journal of Human-Computer Studies, 45, 183-292, 1997
[16] Gonzalez, Woods. Digital Image Processing. 2nd Edition. Ed. Prentice Hall. 2002
Ad-Hoc and Personal Ontologies:
A Prototyping Approach to Ontology Engineering
Debbie Richards
Computing Department,
Division of Information and Communication Sciences,
Macquarie University, Australia
[email protected]
1 Introduction
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 13 – 24, 2006.
© Springer-Verlag Berlin Heidelberg 2006
14 D. Richards
Similarly in knowledge engineering a case can be made for using techniques which
develop ad-hoc and personal ontologies, which can be likened to an evolutionary or
even throwaway prototype, as an alternative or exploratory precursor to the
development of large-scale and/or common ontologies. This is almost the opposite to
approaches which use personal ontologies to extract, restrict or guide an individual’s
usage or access to a larger ontology. For example, the work of Haase et al [18] allows
the user to interact in usage or evolution mode with the ACM Topic Hierarchy, a
domain ontology in Bibster. Usage mode restricts the user’s view of the domain
ontology to the topics the user has chosen to include in their personal ontology, while
evolution mode allows the ontology to be extended for the individual. As [18] points
out, this raises issues of management of the changing ontology and thus their work
provides various change and alignment operations.
Approaching from the other direction, Chaffee and Gauch [7] ask the user to build
a personal ontology in the form of a tree containing at least ten nodes and five pages
per node (the goal of the ontology is to assist with web navigation) to represent their
view of the world. The personal ontology is then mapped to a reference or upper level
ontology. Similarly, the SemBlog personal publishing system uses a “loose and
bottom-up ontology” based on a hierarchy of categories defined by the user on the
basis that “everyone has those categories” which they “routinely [use to] classify …
contents to the category” [28, p. 601].
Some approaches provide technical assistance for personal ontology development.
Carmichael, Kay and Kummerfield [5] use the Verified Concept Mapping technique
based on concept mapping commonly used in education and for knowledge
elicitation. The system contains a number of semantic concepts. These concepts are
shown to the user and may be used as building blocks to develop a personal ontology.
Additionally the system allows the user to define their own concepts and add these to
the model, but the system will not understand the semantics of user-defined concepts.
Likewise, OntoPIM [21] uses a personal ontology to assist the user to manage their
personal desktop information. The personal ontology is developed by providing a
Semantic Save function which allows capture of domain independent as well as
domain specific metadata when an object, such as a picture or a document, are saved.
Following this step, concepts are automatically mapped into the personal ontology by
the system.
Sometimes adhoc and temporary ontologies are used for translating from one
representation to another. For example, Moran and Mocan [27] created an adhoc
ontology equivalent to an XML schema to be used by a Web Service Description
Language (WSDL) description to translate from XML and the Web Services
Modeling Language (WSML).
In contrast to all of the aforementioned research, this paper looks at the use of
personal and ad-hoc ontologies to enable understanding of the domain to be gained
and enhanced, just as one would build a throwaway or evolutionary prototype to
better or incrementally understand the system requirements, application domain or
test a design solution. In knowledge engineering, repertory grids [36] and Protégé
[14] have been used to aid the user to discover and develop their own knowledge in a
domain. In some cases the systems built acted as a communication channel to share
knowledge even though the end product may have never been deployed. Personal
systems, and this includes ontologies, are often more acceptable to users as they tend
Ad-Hoc and Personal Ontologies 15
to be more relevant and meaningful for the individual and allow the user to use their
own terminology and structure according to the users context and preferences.
However, unlike the use of Protégé or repertory grids, the ontology development
approach described in the next section automatically generates ontologies from other
sources. When changes to the sources occur, the ontologies are simply regenerated.
Such a strategy is acceptable if maintenance and ongoing reuse of the ontology is not
required, as in the case of a throwaway or exploratory prototype or model.
In various projects over the past decade, Formal Concept Analysis (FCA) [40] has
been used to build domain specific, personal and/or shared, ad-hoc and usually throw
away ontologies from a number of alternative sources including propositional rules,
cases, use cases, software specifications, web documents and keywords. FCA
achieves this through the notion of a concept as a basic unit of thought comprising a
set of objects and the set of attributes they share, thus providing an intensional and
extensional definition for each primitive concept. FCA then applies various
algorithms based on lattice and set theory to generate new concepts and allow
visualization of the consequences of partial order. Section 2 of this paper provides an
example of the technique. Section 3 introduces some of the applications. Conclusions
are given in section 4.
Prescription
Presbyopic
Tear_prod
astigmatic
astigmatic
= normal
= myope
young
Age =
age =
= yes
1=11
= no
Rule 0-Lens=None X
Rule 1 Lens=Soft X X X
Rule 2 Lens=Hard X X X X
Rule 3 Lens=Hard X X X X
Rule 4 Lens=None X X X X X
The set of concepts are derived from the formal context in Table 1 by treating each
row as an (object) concept and generating additional higher level concepts by finding
the intersection of sets of attributes and the set of objects that share the set of
attributes. For example, rules 1-4 (last four rows in Table 1) share the attribute:
tear_prod=normal. This forms a new concept as shown in concept 2 in Fig. 1. Once
all concepts have been found, predecessors and successors are determined using the
subsumption relation ≥. This allows the complete lattice to be drawn. Disjunctions of
conditions and negation must be removed to allow the rules to be converted into a
binary crosstable. Fig. 1 shows the concept lattice for the Contact Lens Prescription
domain. To find all attributes (rule conditions) and objects (rule numbers and
conclusion codes in our technique) belonging to a concept, traverse all ascending and
descending paths, respectively. For example, concept 7 in Fig. 1 includes the rule
conditions (attributes) {prescription=myope, tear_production=normal, 1=1} and objects
{4-%LENSN (i.e. rule 4, Lens=None) rule, 2-%LENSH (i.e rule 2, Lens=Hard)}.
From this example we can start to explore the relationships between the rules to
improve our understanding of the domain. Table 1 has been provided for explanation of
the approach, however, the user of this ontology development technique would not be
required to define or work with the crosstable. From the user’s point of view, they
would firstly select the knowledge base or dataset of interest and then select which
parts of the knowledge base or dataset that they wished to explore. This could be
achieved via specifying one or more key words that are used to automatically select all
cases or rules which contain the keywords. As the Cendrowska knowledge base is very
small, all rules have been included. By looking at Fig. 1 we see the importance of the
tear_production=normal concept and deduce that if we see a case where tear_production
is not =normal then the default recommendation of “no lens” will be given.
Alternatively, the absence of a condition covering the abnormal state may prompt the
user to consider whether the default rule is adequate or whether an alternative or
additional recommendation should also be given such as treatment=tear_duct_operation.
Moving further down the lattice we can see that astigmatic is an important feature that
will affect the prescription. If astigmatic=no then a soft lens is recommended, but only
1
1=1 ie the default condition that is always true. Rule 0 is the default rule, which will be true if
no other rules fire. That is, prescription =no Lens unless …..
Ad-Hoc and Personal Ontologies 17
when the age=presbyopic and prescription=myope conditions are not true (concept 4
shows the exception rule stated in rule 4). If astigmatic=yes and the prescription=myope
or age=young then a hard lens is recommended. While it is true that there is nothing
shown in the concept lattice that can not be extracted from manually analysing the rules
it should be apparent that the relationships between the rule conditions and conclusions
are more structured and easily determined in the lattice. As in this example, the
increased clarity can be useful in identifying knowledge that is potentially missing. The
ontology has served as a means of understanding the domain and perhaps for
validating/updating the Cendrowska knowledge base which was presumably used to
provide expert opinion.
Fig. 1. The Diagram Screen in our tool which shows the Concept Lattice for the Formal
Context in Table 1
The Cendrowska contact lens dataset and rules have been used by a number of
researchers to demonstrate the value of various knowledge representations such as PRISM
[6], INDUCT [15] and the Visual Language supported by Personal Construct Psychology
[16]. We note that this data and knowledge is now out of date since the introduction of
Rigid Gas Permeable (RGP) contact lenses have made hard contact lenses almost
obsolete. Improvements have also been made to soft lenses in the past decade. Based on a
recently expressed viewpoint of an expert optician2 we demonstrate how new knowledge
can be added to a crosstable for comparison with the old knowledge. The purpose of this
comparison may be to determine if any conflicts have arisen and whether the original
knowledge base needs updating. We do not include every attribute that could have been
used. The rules created (i.e. the rows) and shown in Table 2 are our interpretation of the
information given in the article and are not based on the use of a machine or human built
set of rules developed from cases. We assume that had the optician’s client data been
available there would be multiple rules with exceptions to cover the four possible
classifications. However, our technique is adequate for the purposes of demonstrating how
knowledge from multiple sources of expertise can be displayed and reconciled using
an FCA built ontology.
2
http://www.epinions.com/well-review-5196-AFCVA2d-394701a9-prod2
18 D. Richards
astigmatic =
acuity=high
Presbyopic
Made=yes
Price=low
Healthy=
Uncomf
Custom
ortable
Visual
Age =
1=11
Yes
yes
V2-Rule 0-Lens=None X
V2-Rule 1 Lens=Soft X X X X X
V2-Rule 2 Lens=Hard X X X X X
V2-Rule 3 Lens=RGP X X X X X X X
3
http://www.sourceforge.net/projects/conexp
Ad-Hoc and Personal Ontologies 19
Fig. 2. The Diagram Screen in ConExp which shows the Concept Lattice for the Formal
Contexts in Tables 1 and 2
Common ontologies seek to provide a reusable library of concepts. Adequate time and
effort needs to be taken to get the concepts right, define the terms, relationships,
axioms and so on. The approach to development is typically top-down. FCA however
allows us to work from bottom up, with minimal modeling of the domain beyond the
creation of a crosstable containing objects and attributes. As demonstrated in the
previous section, what is modeled as an object or an attribute can vary according to
the input or questions to be explored. Table 3 provides a summary of a number of
projects that have used FCA derived ontologies using input other than conventional
cases for a range of different purposes. In each of the projects, the interplay of FCA
and ontologies has provided a learning technique, allowed analysis and navigation of
the derived ontology and the ontology has enhanced the FCA application [10]. The
list of projects is far from exhaustive but gives a taste of the possibilities.
Elsewhere discussion can be found on the nature of the ontology developed using
FCA (e.g. [10, 19]) together with a comparison of other techniques for ontology
development (e.g. [32]). The purpose of this paper is to consider the role and value of
using a domain and/or individual specific ontology as a communication channel, or
alternatively, a mediating or temporary representation.
Whether FCA is used to compare rules in a knowledge base, use case descriptions
or documents, the approach allows and encourages individuals to express themselves
using their own terms and on their own without the interference and restrictions
associated with group thinking. This has the benefit of increased engagement with the
20 D. Richards
task and ownership of the knowledge. This becomes even more important when the
goal is to build a shared model. By starting with separate sources each group member
owns and defends a viewpoint to provide a truly representative and more complete
final model. Just as a prototype developed with a 4GL may not give the developer as
much freedom and control over the application developed at they might like, the end
user can see results sooner and may even be able to use the 4GL to develop the
Ad-Hoc and Personal Ontologies 21
We note that there is some conflict between the goal of ontologies and their actual
usage. Some have argued that Gruber’s definition has led to a view that ontology is :
“ ‘a model’ where what is being modeled are the concepts or ideas people have
in their minds. This reductive error has its roots in the recent tendency to use
the word ‘ontology’ to mean little more than a controlled vocabulary with
hierarchical organization”4.
When one remembers that ontologies are concerned with the nature of being and the
world, the focus on what is in people’s heads or the words they use fits more with
linguistics, psychology or epistemology rather than metaphysics.
We also note that the goal to create large-scale common ontologies sought to
address the KA bottleneck by providing guidance and allowing sharing and reuse.
However, it is unclear whether ontology engineering has simply moved the bottleneck
higher and earlier in the KA process. Also the desire to share and reuse has led to the
need for strategies for merging and reconciliation.
4
http://ontologyworks.com/what_is_ontology.php
22 D. Richards
Currently, ontologies are seen to play a pivotal role in the Semantic Web, together
with semantic markup languages. However, the effort involved in the two-step
authoring and annotation process in a formalism such as OIL and/or RDF “tends to
reintroduce the impulse to set up the ‘right’ ontologies in advance. This seems
contrary to letting ‘anyone say anything’ [2] or, perhaps, it simply raises the burden of
generating Semantic Web content to an inhibitory level” [20]. To address this issue,
Kalyanpur et al [20] offer the Semantic Markup Ontology and RDF Editor (SMORE)
to support adhoc ontology use, modification, combination and extension. However,
what SMORE attempts to achieve is a more seamless environment which merges and
simplifies authoring and annotation. The approach allows adhoc use of ontologies, not
to be confused with the use of adhoc ontologies as addressed in this paper. However,
similar to SMORE, we seek to offer a practical approach.
Bennett and Theodoulidis [1] have investigated the notion of personal ontology
and its relationship to organizational ontology and knowledge. They see that a
personal ontology is the outcome of personal world experiences leading to personal
knowledge that forms a personal ontology. When individuals begin to share their
ontologies and agree on meanings, organizational ontologies begin to emerge. In
contrast to the flow from experience to knowledge to ontology for the individual, at the
organizational level, once an organizational ontology exists, organizational knowledge can
emerge resulting in experiences at the organizational level which feed back into personal
experiences. If such a cycle does exist, it may be necessary to ensure that approaches for
engineering common, upper or reference ontologies support the ongoing development,
sharing and integration of personal ontologies.
Despite its various shortcomings, the Waterfall system development life cycle is still
the main development method used in many organizations. In practice the method is often
modified to include incremental and iterative cycles within and between certain phases.
Likewise, common ontologies, such as CYC or WordNet are in widespread use and are
often used in conjunction with domain specific and sometimes personal ontologies. The
use of FCA to rapidly develop domain and personal ontologies offers many parallels to
agile software development in that the technique is incremental, rapid, collaborative and
the development cycle is essentially test-first driven producing prototypes (adhoc
ontologies) for exploring domain or individual-specific concepts. Using FCA to
automatically generate an ontology from whatever source can be mapped to a
crosstable with minimal effort, becomes attractive particularly where the ontology or
domain itself is volatile and/or temporary.
References
[1] Bennett, B. R. and Theodoulidis, B., Towards a notion of Personal Ontology,
http://citeseer.comp.nus.edu.sg/44041.html, accessed 10th July 2006.
[2] Berners-Lee, T. Handler, J. and Lassila. O. (2001) The Semantic Web. Scientific
American, May 2001.
[3] Boettger, K. Schwitter, R., Mollá, D. and Richards, D. (2003) Towards Reconciling Use
Cases via Controlled Language and Graphical Models In O. Bartenstein, U. Geske, M.
Hannebauer, O. Yoshie (eds.), Web-Knowledge Management and Decision Support,
LNCS, Vol. 2543, pp. 115-128, Springer Verlag, Heidelberg, Germany.
Ad-Hoc and Personal Ontologies 23
[4] Busch, P. and Richards, D. (2004) Modelling Tacit Knowledge via Questionnaire Data,
Proc.of 2nd Int.Conf.on FCA (ICFCA 04), Feb 23-26, 2004, Sydney, Australia, 321-329.
[5] Carmichael, D J, J Kay, R J Kummerfeld, (2004) Personal Ontologies for feature
selection in Intelligent Environment visualisations, in Baus, J, C Kray and R Porzel,
AIMS04 - Artificial Intelligence in Mobile System, 44-51.
[6] Cendrowska, J. (1987) An algorithm for inducing modular rule. Int. Journal of Man-
Machine Studies 27(4):349-370.
[7] Chaffee, J. and Gauch, S. (2000) Personal Ontologies For Web Navigation. In Int.Conf.
Info. Knowledge Mgt (CIKM), pp. 227-234.
[8] Cho, W. C. and Richards, D. (2004) Improvement of Precision and Recall for Information
Retrieval in a Narrow Domain: Reuse of Concepts by Formal Concept Analysis, Proc.
IEEE/WIC/ACM Int. Conf. Web Intell. (WI'2004), Sept. 20-24, Beijing, China, 370-376.
[9] Cho, W. C. and Richards, D. (2006) Automatic construction of a concept hierarchy to
assist Web document classification, Proc. 2nd Int.Conf.on Info. Mgt and Business
(IMB.2006), 13-16 February, 2006, Sydney, Australia.
[10] Cimiano, P., Hotho, A., Stumme, G. and Tane, J. (2004) Conceptual Knowledge
Processing with Formal Concept Analysis and Ontologies, LNCS, Vol 2961:189 – 207.
[11] Cimiano, P., Staab, S. and Tane, J. (2003) Automatic Acquisition of Taxonomies from
TexT: FCA meets NLP, Proc. of the Int. W’shop on Adaptive Text Extraction and Mining.
[12] Colomb, R.M. (1989) Representation of Propositional Expert Systems as Decision Tables
Technical Report TR-FB-89-05 Paper presented at 3rd Joint Aust. AI Conf. (AI'89)
Melbourne, Victoria, Australia 15-17 November, 1989.
[13] Erdmann, M. (1998) Formal concept analysis to learn from the sisyphus-III material.In:
Proc. of 11th KA for KBS Workshop, KAW'98, Banff, Canada 1998.
[14] Eriksson, H., Fergerson, R. W., Shahar, Y. and Musen, M. A. (1999) Automatic
Generation of Ontology Editors In KAW'99, 16-21 October, 1999, Banff.
[15] Gaines, B.R., (1991) Induction and Visualization of Rules with Exceptions In J. Boose &
B. Gaines (eds), Proc.6th Banff AAAI KAW’91, Canada,Vol 1: 7.1-7.17.
[16] Gaines, B. R. and Shaw, M.L.G. (1993) Knowledge Acquisition Tools Based on Personal
Construct Psychology Knowledge Engineering Review 8(1):49-85.
[17] Ganter, B. and Wille, R., (1999) Formal Concept Analysis – Mathematical Foundations,
Springer-Verlag, Berlin.
[18] Haase, P. Stojanovic, N., V¨olker, J. and Sure, Y. (2005) Personalized Information
Retrieval in Bibster, a Semantics-Based Bibliographic Peer-to-Peer System, Proceedings
of I-KNOW ’05 Graz, Austria, June 29 - July 1, 2005
[19] Kalfoglou, Y, Dasmahapatra, S. and Chen-Burger, Y-H, (2004) FCA in Knowledge
Technologies: Experiences and Opportunities, LNCS 2961, Feb 2004, Pages 252 – 260
[20] Kalyanpur, A., Parsia, B., Hendler, J. and Golbeck, J. (2001) "SMORE - Semantic
Markup, Ontology and RDF Editor". Technical Report www.mindswap.org/~aditkal/
SMORE.pdf
[21] Katifori, V., Poggi A., Scannapieco, M., Catarci, T. and Ioannidis, Y. (2005) OntoPIM:
How to Rely on a Personal Ontology for Personal Information Management, In Proc. of
the 1st Workshop on The Semantic Desktop, 2005.
[22] Kelly, G.A, (1955) The Psychology of Personal Constructs Norton, New York.
[23] Kent, R.E. and C. Neuss. 1995. Creating a Web Analysis and Visualization Environment.
Computer Networks and ISDN Systems, 28.
[24] Kim, S., Hall, W., and Keane, A. (2001). Using document structure for personal
ontologies and user modeling. In Bauer, M., Vassileva, J., and Gmytrasiewicz, P. (Eds.),
User Modeling: Proc. of the 8th In. Conf. UM2001, Berlin: Springer, pages 240–242.
24 D. Richards
[25] Kim, M. and Compton, P., (2004) Evolutionary Document Management and Retrieval for
Specialized Domains on the Web. Int.l Jrnl of Human Computer Studies. 60(2):201-241.
[26] Maciaszek L.A., Liong B.L., (2005), Practical Software Engineering - A Case Study
Approach, Addison-Wesley.
[27] Moran M. and Mocan A. (2005) Towards Translating between XML and WSML based
on mappings between XML Schema and an equivalent WSMO Ontology: Second WSMO
Implementation Workshop (WIW '05), June 2005, Innsbruck, Austria.
[28] Ohmukai, I, Takeda, H., Hamasaki, M., Numa, K. and Adachi, S. (2004) Metadata-driven
Personal Knowledge Publishing. S.A. McIlraith et al. (Eds.): ISWC 2004, LNCS 3298,
Springer-Verlag Berlin Heidelberg, 591–604.
[29] Prediger, S. and Stumme, G. (1999) Theory-driven Logical Scaling: Conceptual
Information Systems meet Description Logics. KRDB 1999: 46-49
[30] Richards, D. (1998) An Evaluation of the Formal Concept Analysis Line Diagram, Poster
Proc. AI'98, 13-17 July 1998, Griffith University, Brisbane, Australia, 109-120.
[31] Richards, D. (2000) Reconciling Conflicting Sources of Expertise: A Framework and an
Illustration, Proc. of PKAW'2000, December 11-14, 2000, Sydney.
[32] Richards, D. (2003) Merging Individual Conceptual Models of Requirements, Special
Issue on Model-Based Requirements Engineering for the Int. Jrnl of Requirements
Engineering, (2003) 8:195-205.
[33] Richards, D. (2004), Addressing the Ontology Acquisition Bottleneck through Reverse
Ontological Engineering, Jnl of Knowledge and Information Systems (KAIS), 6:402-427.
[34] Richards, D. and Compton, P. (1997) Uncovering the Conceptual Models in Ripple Down
Rules In Dickson Lukose, Harry Delugach, Marry Keeler, Leroy Searle, and John F. Sowa,
(eds) (1997), Conceptual Structures: Fulfilling Peirce's Dream, Proc. of the 5th Int. Conf. on
Conceptual Structures (ICCS'97), LNCS 1257, Springer Verlag, Berlin, 198-212.
[35] Richards, D. and Malik, U. (2003) Multi-Level Knowledge Discovery in Rule Bases,
Applied Artificial Intelligence, Taylor and Francis Ltd, March 2003, 17(3):181-205
[36] Shaw, M.L.G. and Gaines, B.R., (1991) Using Knowledge Acquisition Tools to Support
Creative Processes In Proc.of the 6th KA for KBS Workshop, Banff, Canada.
[37] Spangenberg, N., Wolff, K.E. (1988) Conceptual grid evaluation, In Classification and
related methods of data analysis. Spangenberg, Norbert and Karl Erich Wolff, Eds.,
Amsterdam, North-Holland: 577-580.
[38] Stumme, G. and Madche., A.(2001) FCA-Merge: Bottom-up merging of ontologies. In
7th Intl. Conf. on Artificial Intelligence (IJCAI '01), pages 225--230, Seattle, WA, 2001.
[39] Tilley.T. (2003) A Software Modelling Exercise using FCA. In Proceedings of the 11th
Inter-national Conference on Conceptual Structures (ICCS’03), Springer LNAI 2746,
Dresden, Germany, July 2003.
[40] Wille, R. (1992) Concept Lattices and Conceptual Knowledge Systems Computers Math.
Applic. (23) 6-9: 493-515.
[41] Wille, R. (1997) Conceptual Graphs and Formal Concept Analysis. In D. Lukose et.al.,
editor, Conceptual Structures: Fulfilling Peirce's Dream, Springer, LNAI 1257, 290--303.
Relating Business Process Models
to Goal-Oriented Requirements Models in KAOS
1 Introduction
Business Process Management (BPM) in its “third wave” [1] has been conveyed
as: enabling intelligent business management [3]; facilitating the redesign and
organic growth of information systems [4]; and, obliterating the business - IT
divide [1]. Business processes undergo an evolutionary life-cycle of change. This
change is brought on by the need to satisfy the constantly changing goals of
varied stakeholders and adapt to the accelerating nature of change in today’s
business environment [5]. The need for change is best described in [6] as the
transition from an initial “unsatisfactory” (i.e. as-is) state to a new hypotheti-
cally “desired” (i.e. to-be) state. The desired state is theoretically based on the
assumption that it more effectively satisfies related operational goals [4] [7] [8]
in-line with higher-level strategic goals. It is therefore important that the cri-
terion for effective process change - i.e. stakeholder goals, be explicitly stated,
communicated and traceable to any changes that are proposed, approved, and/or
implemented.
The new-found agility provided by BPM, however presents the need for meth-
ods to successfully control and trace the evolution of processes. This need is af-
firmed in [7], by stating that organizations evolve from their original intentions
through complex and unpredictable growth. BPM aims to support the evolution
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 25–39, 2006.
c Springer-Verlag Berlin Heidelberg 2006
26 G. Koliadis and A. Ghose
of organizations and their processes, however controls are still needed to ensure
that operational as well as higher-level goals (i.e. of more strategic concern) are
continually satisfied, allowing for “organizational growth in the right direction”.
In order to meet goals however, there is a need to support traceability between
processes and organizational goals - “You can’t manage what you can’t trace”
[9].
We have proposed a method (GoalBPM) to support the controlled evolu-
tion of business processes. Control is supported through the explicit modeling of
stakeholder goals, their relationships (be it either refinement, conflict or obstruc-
tion), and their evolution traceable to related business processes. GoalBPM is
used to couple an existing and well-developed, informal-formal goal modeling and
reasoning methodology - i.e. KAOS [10], and a newly developed business process
modeling notation - i.e. BPMN [11]. This is achieved through the identification
of a satisfaction relationship between the concepts represented. GoalBPM itself
can be seen as an “adapter” that integrates the two models, to support their
co-evolution and synergistic use.
This paper firstly presents a background to the associated domains of busi-
ness process modeling and goal-oriented requirements engineering. An informal
overview of the GoalBPM method is subsequently outlined with a simple exam-
ple for illustration.
bond a package. Finally, the process is completed with an end event, or bold
circle toward the end of a process.
Goal Achieve[PackageSortedToDestination]
InformalDef If a package is received at a sort facility, then the package
will eventually be forwarded to its known destination.
FormalDef @p: Package, sf: SortFacility
Received(p, sf) ñ Forwarded(p, p.Destination)
Patterns for Declaring and Defining Goals. KAOS defines a number of com-
monly used “Goal Patterns” that generalize the timeliness of target situations.
They provide an informal method to initially declare goals, as well as to guide for-
mal definition. Achieve Goals (C ñ T ) desire achievement ‘some time in the fu-
ture’. That is, the target must eventually occur. (e.g. Achieve[PaymentRecieved]).
Cease Goals (C ñ T ) disallow achievement ‘some time in the future’. That
is, there must be a state in the future where the target does not occur (e.g.
Cease[Operation]). Maintain Goals (C ñ T ) must hold ‘at all times in the fu-
ture’ (e.g. Maintain[EmployeeSafety]). Avoid Goals (C ñ T ) must not hold
‘at all times in the future’ (e.g. Avoid[LateEntry]).
Relating Business Process Models 29
identifies a set of critical trajectories from a process model. Third, it identifies the
subset of the set of traceability links that represent satisfaction links by analyzing
critical trajectories relative to process effect annotations. The satisfaction links
thus obtained are descriptive satisfaction links. A final step in GoalBPM is to use
a comparison of the set of normative satisfaction links with the set of descriptive
satisfaction links to drive the processes of goal model update and/or process model
update.
Our approach may be viewed as an instance of the state-oriented view [15]
[16] [17] of business processes as opposed to the agent-oriented or workflow views.
However, we are not explicitly state-based in that we do not seek to obtain state
machine models from process models, for two reasons. First, BPMN models
in general do not guarantee finite state systems, making the application model
checking techniques difficult. Second, the derivation of state models from BPMN
models appears difficult at this time, due to the high-level, abstract nature of
BPMN models.
the decision on which choice of path to commit to can help to identify important
effects on prior activities in the current, or in other processes. These influential
activities and their required effects for the current path of execution, need to be
identified and represented along with the effects of the current process to prove
goal satisfaction.
We define an effect annotation to include:
– a label that generalizes the behavior of the effect in relation to its environ-
ment (e.g. ‘CustomerDetailsStored’). Whereas the labeling of an activity is
made in the optative mood (i.e. a desire), an effect annotation is made in
the indicative mood (i.e. a fact).ling of an activity is made in the optative
mood (i.e. a desire), an effect annotation is made in the indicative mood (i.e.
a fact).
– a designation specifying whether the effect is a ‘normal’ (i.e. expected) out-
come for the activity that in turn aims toward goal achievement, or an ‘ex-
ceptional’ (i.e. unaccepted) effect that deviates from goal achievement. (e.g.
‘RegistrationValidated’ may be a normal outcome for a customer registration
activity, whereas ‘RegistrationRejected’ may be exceptional)
– a informal definition an informal definition describing the effect in relation
to the result achieved in its environment (e.g. ‘The details relating to the
current customer have been stored within the system.’). This provides an
informal explanation (i.e. meaning) of the effect in relation to the real-world
environment.
– a formal definition (optional) defining achieved states to aid in mapping to
formal goal definitions in the chosen goal definition formalism (i.e. in this
case KAOS). (e.g. ‘@ c: Customer, (D cr: CustomerRecord) Stored(c.Details,
cr)’)
At the tool level, effect annotations can be viewed on a business process model
graphically, or added to meta-information relating to the process activities. They
can then be analyzed along with the process and associated goals as described
in the subsequent sections.
We firstly progress through each process trajectory and compare effects with
traceable satisfaction goals in the goal model. Effects are compared to the de-
sireability and temporal ordering of effects in normative goals and descriptive
satisfaction links are established as we progress through the trajectory.
We then analyze and classify each trajectory as either normal or exceptional.
A normal trajectory in relation to the goal model leads to the satisfaction of all
normative goals. An exceptional trajectory, as described in the previous section,
satisfies a limited number of normative satisfaction goals.
In order for a satisfaction relationship to exist between a goal model and an
associated process model, there must be at least one normal trajectory. That is,
the process model must support at least one valid means by which to satisfy the
required normative goals of the process.
Finally, we analyze the outcome of the satisfaction process, identifying whether
the process supports the achievement normative satisfaction goals and classify
the satisfaction relationship between the process and the associated goal model
as either strong, weak, or unsatisfied. A strong satisfaction relationship is deter-
mined if all possible trajectories are ‘normal’ (i.e. satisfy all associated goals).
On the other hand, a weak satisfaction relationship is said to exist when there is
at least one ‘exceptional’ trajectory and one ‘normal’ trajectory. This classifica-
tion, delineating between weak and strong satisfaction, can be important when
evaluating the competency of the process in recovering from exceptional situa-
tions that may arise during enactment. An unsatisfied satisfaction relationship
is the result of there not being a single ‘normal’ trajectory decomposed from the
process. This classification requires that changes are made to either the process
and/or goal model to establish a weak or strong satisfaction relationship.
Relating Business Process Models 33
6 Example
Effect Annotation. Firstly, we annotate the model with effects to identify the
achievable (and alternative) outcomes of activities in the current process. We also
include the pre-conditions themselves, and any other relevant/influential effects
that may be caused by other processes that have a direct impact on process
decisions and coordination. These annotations are listed in Figure 5.
Process initiation is governed by two conditions: the arrival of packages to the
sort facility and the provision of package information to transport authorities.
It is also identified that the prior provision of information to authorities may
allow for the rapid clearance of packages for delivery prior to the sorting process
initiating. This may occur if the requirements of the authority can be identified
as being met. These effects are also added to the list of relevant/influential effects
that may have occurred prior to process initiation.
Each activity is then analyzed and annotated with normal and exceptional ef-
fects. Scanning a package results in the delivery details being known. The package
may be bonded for clearance with another scan being applied, and alternatively
held if the transport authority requests. The latter is an exceptional effect that
occurs due to the package meeting some characteristics that require it to be
given to the authorities that then take sole control of the package from that
36 G. Koliadis and A. Ghose
point on. The outcome of releasing a package is the passing of transport author-
ity requirements and ultimately clearance. The sorting activity then results in
the package being sorted to its destination. The final scanning activity results in
another update of package location.
requiring all packages be screened by the transport authority once they arrive at
a sort facility, or formally:
Process Implications and Evolution. Any alterations to the goal model will
proportionately affect the desired achievement or coordination of effects within
the business processes that are assigned their operationalization. We re-evaluate
the satisfaction relationship between the ‘Package Sorting’ process and its goals,
and apply some informal analysis to identify specific changes required at the
process level.
Upon evaluation of the satisfaction relationship, it is identified that previ-
ously normal trajectories (i.e. (1) and (3)), are now also exceptional due to their
inability to satisfy regulatory requirements. This is consistent with the modifi-
cations to the goal model.
7 Conclusion
We have proposed the GoalBPM methodology that can be used to identify the
satisfaction of a process model against a goal model. The example we presented,
provides a brief and informal overview. There are many possible benefits in ap-
plying GoalBPM to current business process design and analysis. This includes
the initial intentional design of business processes that satisfy a deliberate spec-
ification of goals. Changes to the business process model may then be made
and tested against the specification of the goals they wish to satisfy in the goal
model. Changes may also be made to the goal model and tested against the
current business process model to identify behaviors that are invalidated. In-
valid behavior may be explicitly defined, supporting further redesign to align
the processes changed against organizational goals. In order to progress from
the current state, the need for formalism, tool support and testing against a
large, non-trivial business case is required. We are actively pursuing these re-
quirements, which we hope will increase our understanding of the realizability,
workability and viability of GoalBPM for its active use.
References
1. Smith, H., Fingar, P.: Business Process Management: The Third Wave. Meghan-
Kiffer Press, Tampa, FL (2003)
2. Dumas, M., van der Aalst, W.M., ter Hofstede, A.H.: Process-Aware Informa-
tion Systems: Bridging People and Software Through Process Technology. Wiley-
Interscience (2005)
3. McGoveran, D.: The benefits of a bpms. Technical report, Alternative Technolo-
gies, Felton, California, USA (2002)
4. van der Aalst, W., ter Hofstede, A., Weske, M.: Business process management: A
survey. In: BPM’03 - International Conference on Business Process Manag ement,
Berlin, Springer-Verlag, Lecture Notes in Computer Science (2003) 1–12
Relating Business Process Models 39
5. Youngblood, M.D.: Winning cultures for the new economy. Strategy and Leader-
ship 28(6) (2000) 4–9
6. Kavakli, E.: Modelling organizational goals: Analysis of current methods. In:
Proceedings of the 2004 ACM Symposium on Applied Computing, Nicosia, CY
(2004) 1339–1343
7. Pyke, J., Whitehead, R.: Do better maths lead to better business processes? Busi-
ness Process Trends, http://www.bptrends.com (2004)
8. Wynn, D., Eckert, C., Clarkson, P.J.: Planning business processes in product de-
velopment organisation s. In: REBPS’03 - Workshop on Requirements Engineering
for Busines s Process Support, Klagenfurt/Veldern, Austria (2003)
9. Watkins, R., Neal, M.: Why and how of requirements tracing. IEEE Software
11(4) (1994) 104–106
10. van Lamsweerde, A.: Goal-oriented requirements engineering: A guided tour. In:
RE’01 - International Joint Conference on Requirements Engi neering, Toronto,
IEEE (2001) 249–263
11. White, S.: Business Process Modeling Notation (BPMN), Version 1.0. Business
Process Management Initiative (BPMI.org). 1.0 edn. (2004)
12. Letier, E., van Lamsweerde, A.: Deriving operational software. In: FSEı́10 - 10th
ACM SIGSOFT Symp. on the Foundations of S oftware Engineering. (2002)
13. Letier, E.: Reasoning about Agents in Goal-Oriented Requirements Engineerin g.
PhD thesis, Universite Catholique de Louvain, Louvain, Belgium (2001)
14. van Lamsweerde, A., Letier, E.: Handling obstacles in goal-oriented requirements
engineering. IEEE Transactions on Software Engineering 26(10) (2000) 978–1005
15. Bider, I., Johannesson, P.: Tutorial on: Modeling dynamics of business processes
– key for building next generation of business information systems. in: The 21st
international con-ference on conceptual modeling (er2002), tampere, fl, october
7-11, 2002. In: 21st International Conference on Conceptual Modeling (ER2002),
Tampere, FL (2002)
16. Khomyakov, M., Bider, I.: Achieving workflow flexibility through taming the chaos.
In: OOIS’00 - 6th International Conference on Object Oriented I nformation Sys-
tems, Springer-Verlag, Berlin (2000) 85–92
17. Andersson, T., Andersson-Ceder, A., Bider, I.: State flow as a way of analysing
business processes - case stud ies. Logistics Information Management 15(1) (2002)
34–45
Heuristic and Rule-Based Knowledge Acquisition:
Classification of Numeral Strings in Text
1 Introduction
Most efforts directed towards understanding natural language in text focus on se-
quences of alphabetical character strings. However, the text may include different
types of data such as numeric (e.g. “25 players”) - or alpha-numeric (e.g. “25km/h”) -
with/without special symbols (e.g. “$2.5 million”) [5]. In current natural language
processing (NLP) systems, such strings are treated as either a numeral (e.g. “25 play-
ers”) or as a named entity (NE e.g. “$2.5 million”) at the lexical level. However, am-
biguity of semantic/syntactic interpretation can arise for such strings at the lexical
level only: for example, the number “21” in the phrase “he turns 21 today” can on the
surface be interpreted as any of the following: (a) as a numeral of NP (noun phrase) –
indicating NUMBER; (b) as a numeral of NP – indicating the DAY of a date expres-
sion; or (c) as a numeral of NP – indicating AGE at the lexical meaning level. This
type of numeral string is called a separate numeral string (e.g. the quantity in “survey
of 801 voters”) in this paper. Some numeral strings would not be ambiguous because
of their meaningful units, and they are referred to as affixed numeral strings (e.g.
speed in “his serve of 240km/h”).
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 40 – 50, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Heuristic and Rule-Based Knowledge Acquisition 41
In the case of separate numeral strings, some structural patterns (e.g. DATE) or
syntactic functional relationships (e.g. QUANTITY as either a modifier or a head
noun) could be useful in their interpretation. However, affixed numeral strings require
the understanding of some meaningful units such as SPEED (“km/h” in “250km/h”),
LENGTH (“m” in “a 10m yacht”), and DAY_TIME (“am”, “pm” in “9:30pm”).
Past research has rarely studied the understanding of varieties of numeral strings.
Semantic categories have been used for named entity recognition (e.g. date, time,
money, percent etc.) [7] and for a Chinese semantic classification system [13]. Se-
mantic tags (e.g. date, money, percent, and time) and a character tokeniser to identify
semantic units [1] were applied to interpret limited types of numeral strings. Numeral
classifiers to interpret money and temperature in Japanese [11] have also been stud-
ied. The ICE-GB grammar [8] treated numerals as one of cardinal, ordinal, fraction,
hyphenated, multiplier with two number features - singular and plural.
Polanyi and van den Berg [9] studied anaphoric resolution of quantifiers and cardi-
nals and employed quantifier logic framework. Zhou and Su [14] employed an HMM-
based chunk tagger to recognise and classify names, times, and numerical quantities
with 11 surface sub-features and 4 semantic features like FourDigitNum (e.g. 1990)
as a year form, and SuffixTime (e.g. a.m.) as a time suffix (see also [3] and [10] for
time phrases in weather forecasts). FACILE [2] in MUC used a rule-based named
entity recognition system incorporating a chart-parsing technique and semantic cate-
gories such as PERSON, ORGANISATION, DATE, and TIME.
We have implemented a numeral interpretation system that incorporates word tri-
gram construction using a tokeniser, rule-based processing of number strings, and n-
gram based disambiguation of classification (e.g. a word trigram - left and right
strings of a numeral string). The rule-based number processing system analyses each
number string morpho-syntactically in terms of its type. In the case of a separate nu-
meral string, its assumed categories are produced at the lexical level. For example,
“20” would be QUANT, DAY, or NUMBER at the lexical level. However, affixed
numeral strings require rule-based processing based on morphological analysis be-
cause the string has its own meaningful semantic affixes (e.g. speed unit in
“24km/h”). In this paper, the different types of rule needed to classify numeral strings
are described in detail.
In the next section, the categories and rules used in this system are described. In
section 3, we describe the understanding process for both separate and affixed nu-
meral strings in more detail, and focus on classification rules. Section 4 describes
preliminary experimental results obtained with this approach, and discussion and
conclusions follow.
There are two types of dictionaries in our system: one for normal English words
with syntactic information such as lexical category, number, and verb’s inflectional
form. The other dictionary (called the user-defined dictionary) includes symbol tokens
(e.g. “(“, “)”) and units (e.g. “km”, “m”). For example, the lexical information for
“km” is (:POS (Part of Speech) LU (Length Unit)) with its meaning KILOMETER.
The system uses 64 context-free rules to represent the structural form of affixed
numeral strings. Each rule describes relationships between syntactic/semantic catego-
ries of the components (e.g. a character or a few characters and a number) produced
by morphological analysis of the affixed string. Each rule is composed of a LHS (left
hand side), RHS (right hand side), and constraints on the RHS (e.g. DATE Æ (DAY
Heuristic and Rule-Based Knowledge Acquisition 43
Affixed numeral strings such as “240km/h serve” and “a 10m yacht” require knowl-
edge of their expression formats (e.g. speed Æ number + distance-unit + slash + time-
unit) for understanding. For example, the string “240km/h” is analysed morphologically
into “240” + “km” + “/” + “h”. Our morphological analyser considers embedded
punctuation and special symbols. In the case of the string “45-year-old”, the morpho-
logical analyser separates it into “45” + “-” + “year” + “-” + “old”. Thus we use the
term, morphological analysis, rather than tokenisation because each analysed symbol
is meaningful in numeral string interpretation. Table 2 shows some more results from
the morphological analyser.
After analysing the string, dictionary lookup and a rule-based numeral processing sys-
tem based on a simple bottom-up chart parsing technique [6] are invoked. Instances that
include some special forms of number (e.g. “03” in a time, day), are not stored in the lexi-
con. Thus if the substring is composed of all digits, then the substring is assigned to
several possible numeric lexical categories. For example, if a numeral string “03” is en-
countered, then the string is assigned to SECOND, MINUTE, HOUR, DAY, MONTH,
and BLDNUMBER (signifying digits after a decimal point, e.g. “0.03”). If the numeral
string is “13” or higher, then the category cannot be MONTH. Similar rules can be applied
to DAY and other categories. However, “13” can clearly be used as a quantifier.
Non-numeral strings are processed by dictionary lookup as mentioned above, and
their lexical categories used are necessarily more semantic than in regular parsing. For
44 K. Min, S. MacDonell, and Y.-J. Moon
example, the string “m” has three lexical categories: LU (Length Unit) as a METER
(e.g. “a 10m yacht”), MILLION (e.g. “$1.5m”), and TU (Time Unit) as a MINUTE
(e.g. “12m 10s” - 12 minutes and 10 seconds).
After morphological processing of substrings, an agenda-based simple bottom-up
chart parsing process is applied with 64 context-free rules that are augmented by con-
straints. If a rule has a constraint, then the constraint is applied when an (inactive)
phrasal constituent is created. For example, the rule to process a date of the form
“28.03.2003” is DATE Æ (DAY DOT MONTH DOT YEAR) with the constraint
(LEAPYEARP DAY MONTH YEAR), which checks whether the date is valid. An
inactive phrasal constituent DATE1 with its RHS, (DAY1 DOT1 MONTH1 DOT2
YEAR1), would be produced and the constraint applied to verify the well-formedness
of the inactive constituent.
The well-formedness of DATE (e.g. “08.12.2003”) is verified by evaluating the
constraint (LEAPDATEP DAY MONTH YEAR). Some other rules for af-
fixed/separate numeral string interpretation are:
RULE5 LHS: AGE
RHS: (NUMBER HYPHEN NOUN HYPHEN AGETAG) – e.g.
“38-year-old man”
Constraints: ((INTEGER-NUMBER-P NUMBER) (SEMANTIC-
AGE-P NOUN) (SINGULAR-NOUN-P NOUN))
RULE21 LHS: TEMPERATURE
RHS: (NUMBER CELC) – e.g. “40C”
Constraints: (INTEGER-NUMBER-P NUMBER)
Where CELC means CELsius-C
RULE22 LHS: RANGE
RHS: (NUMBER HYPHEN NUMBER) – e.g. “20-30 minutes”
Constraints: (RANGE-P NUMBER NUMBER)
RULE30 LHS: FLOATNUMBER
RHS: (NUMBER DP BLDNUMBER) – e.g. “20.54 percent”
Constraints: (FLOATNUMBER-P NUMBER DP BLDNUMBER)
where DP means Decimal Point and BLDNUMBER means BeLow-
Decimal NUMBER.
RULE42 LHS: WEIGHT
RHS: (NUMBER WU) – e.g. “55kg”
Constraints: NIL.
where WU means Weight Unit.
For separate numeral strings, the interpreted categories can be ambiguous because
there is no semantic unit attached. For example, “240km/h” would be uniquely inter-
preted as SPEED. However, the numeral string “20” could be either QUANT (e.g.
“20 boys”) or DAY (e.g. “20 May 2005”) without using different context information.
Thus word trigrams are used to disambiguate the syntactic/semantic categories of
numeral strings.
Heuristic and Rule-Based Knowledge Acquisition 45
Word trigrams are collected when a document is read and tokenised. While to-
kenising a string (tokenisation based on a single whitespace), the numeral string is
identified with its word trigram (left and right string of the numeral string). For exam-
ple, the numeral string “100” in “The company counts more than 100 million regis-
tered users worldwide.” has its word trigram (“than” - left wordgram, “million” - right
wordgram). If a numeral string occurs at either the start or end of a sentence, then
either a left or right wordgram would be empty (i.e. NULL).
“of” “voters”
The features used for selection of the best meaning of an interpreted numeral string
are based on syntactic and surface features. These features are joined together to re-
flect contextual information (e.g. using neighbour words information). The current
system uses left and right adjacent words of the numeral string with the following
features (Table 3): lexical category (e.g. NOUN, VERB), number information (e.g.
PLURAL, SINGULAR), validity of values (e.g. valid DAY), semantic information
46 K. Min, S. MacDonell, and Y.-J. Moon
(e.g. MONTH concept), Case of a letter (e.g. capitalisation), and punctuation marks
(e.g. PERIOD, COMMA).
With the features extracted from a numeral string’s word trigram, the contextual
features of the word trigram are used for the selection of the ‘best’ category for that
numeral string. The contextual information is extracted manually and its form is based
on conjunction of the word trigrams features (Fig. 1).
For QUANT category disambiguation, 22 constraints are used. The word trigram
(wordgram) for a numeral string “801” in the substring “survey of 801 voters” would
be (“of” “801” “voters”). Thus one QUANT selection rule would be:
(and (of-category-p left-wordgram (“of”))
(plural-noun-p right-wordgram (“voters”))).
If the numeral string “20” is in the string “March 20 2003”, then the category would
be DAY and one of four selection rules would be:
(and (month-string-p left-wordgram (“March”))
(valid-day-p “March” “20”) - not “March 35”
(number-p right-wordgram (“2003”))).
If a numeral string (e.g. “41” in “Lee, 41, has”) satisfies one of three AGE rules,
then the numeral string is disambiguated as AGE. The rule is:
(and (capital-letter-p left-wordgram( “Lee”))
(comma-p left-wordgram (“,” in “Lee,”))
(comma-p numeral-string (“,” in “41,”)).
This rule means that if the word in the left wordgram begins with a capital letter, if
the word in the left wordgram has a comma, and if the numeral string has a comma,
then the rule applies.
For DAYTIME category (e.g. “on Tuesday (0030 NZ time Wednesday)”), seman-
tic meanings of word trigrams are used with the numeral string’s surface pattern. The
rule is:
(and (weekday-string-p left-wordgram)
(= 4 (length numeral-string))
(daytime-string-p numeral-string)
(country-name-p right-wordgram)).
For YEAR category (e.g. “end by September 2026”), heuristic constraints are used
as follows:
(and (>= numeral-string 1000)
(<= numeral-string 2200)).
The contextual information based on word trigrams is applied to disambiguate
multiple categories resulting from the numeral string interpretation process. Two
heuristic methods are implemented to compare their results:
• Heuristic Method 1 (Method-1) – Method 1 applies wordgram constraints and
collects and then considers all satisfied constraints. For example, if the QUANT
and NUMBER constraints for a numeral string are satisfied, then the two
Heuristic and Rule-Based Knowledge Acquisition 47
categories are used for the numeral’s disambiguation. With collected categories,
the annotation frequency of the categories collected from sample data (i.e. Sam-
ple data in Table 4) is used to select the best category. If the frequency of
QUANT is greater than that of NUMBER in annotation statistics, then QUANT
is selected for the category of the numeral string from the meanings processed
by a numeral string interpretation system. If a constraint for AGE category is
satisfied, then a new category, AGE, is produced because the numeral string in-
terpretation system could not produce the category. If there is no category that
satisfies the constraints, then preference rules with ordered categories based on
frequency of annotation (e.g. QUANT > MONEY > DATE > etc.) are applied to
select the best category.
• Heuristic Method 2 (Method-2) – Method 2 is similar to Method-1 except for the
application of annotation statistics. To select the best category for an ambiguous
numeral string, the rareness of annotation statistics is applied. If both YEAR and
QUANT constraints are satisfied, and the annotation frequency of YEAR is less
that that of QUANT, then YEAR is selected for the category of the numeral
string. If there is no category that satisfies the constraints, then preference rules
with ordered categories based on rareness of annotation statistics (e.g. DATE <
MONEY < YEAR < etc.) are applied to select a category.
4 Experimental Results
We implemented our system in Allegro common lisp with IDE. We collected 9 sets of
online newspaper articles and used 91 articles (sample data) to build disambiguation
rules for the categories of numeral strings. The remaining 287 articles (test data) were
used to test the system. Among the 48498 words in the 91 sets of sample data, 886
numeral strings (1.8% of total strings and 10 numeral strings out of 533 strings for
each article on average) were found. In the case of the test data, 3251 out of 144030
words (2.2% of total strings and 11 numeral strings of 502 strings for each article on
average) were identified as numeral strings (Table 4).
The proportion of numeral strings belonging to each category in both sample and
test data were QUANT (826 of 3251, 20.0%, e.g. “survey of 801 voters”), MONEY
(727, 17.6%, e.g. “$15m”, “$2.55”), DATE (380, 9.2%, e.g. “02.12.2003”), YEAR
(378, 9.1%, e.g. “in 2003”), NUMBER (300, 7.3%, e.g. “300 of the Asian plants”),
SCORES (224, 5.4%, e.g. “won 25 - 11”), FLOATNUMBER (8.0%, e.g. “12.5 per
cent”), and others in order.
Table 5 shows the recall/precision/F-measure ratios (balanced F-measurement)
based on the two disambiguation methods. Method-2 for the test data shows better
48 K. Min, S. MacDonell, and Y.-J. Moon
recall ratio (77.6%) than Method-1. Method-1 for the test data shows better precision
ratio (86.8%) than Method-2. However, the difference between Method-1 and
Method-2 is 0.5% in recall ratio and 0.5% in precision ratio, indicating that the per-
formance of each method is close to identical.
Compared to other separate categories, the system interprets the QUANT category
better than YEAR and NUMBER because the disambiguation module for QUANT
category has more constraints based on wordgram information (i.e. 22 constraints for
QUANT, 7 for YEAR, and 6 for NUMBER).
For disambiguation process using Method-1 and Method-2, 2213 (53.5%) of 4137
numeral strings were ambiguous after numeral string interpretation process. Among
these, the numbers of satisfied constraints are no constraint (746 – 33.7%), one con-
straint (1271 – 57.4%), 2 constraints (185 – 8.4%), and three constraints (11 – 0.5%).
Heuristic and Rule-Based Knowledge Acquisition 49
References
1. Asahara, M., Matsumoto Y.: Japanese Named Entity Extraction with Redundant Morpho-
logical Analysis. Proceedings of HLT-NAACL 2003. (2003) 8-15
2. Black, W., Rinaldi, F., Mowatt, D.: FACILE: Description of the NE system used for
MUC-7. Proceedings of MUC-7. (1998)
50 K. Min, S. MacDonell, and Y.-J. Moon
3. Chieu, L., Ng, T.: Named Entity Recognition: A Maximum Entropy Approach Using
Global Information. Proceedings of the 19th COLING. (2002) 190-196
4. CoNLL-2003 Language-Independent Named Entity Recognition.
http://www.cnts.uia.ac.be/conll2003/ner/2. (2003)
5. Dale, R.: A Framework for Complex Tokenisation and its Application to Newspaper Text.
Proceedings of the second Australian Document Computing Symposium. (1997)
6. Earley, J.: An Efficient Context-Free Parsing Algorithm. CACM. 13(2) (1970) 94-102
7. Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named Entity Recognition
from Diverse Text Types. Proceedings of Recent Advances in NLP. (2001)
8. Nelson, G., Wallis, S., Aarts, B.: Exlporing Natural Language - working with the British
Component of the International Corpus of English, John Benjamins, The Netherlands.
(2002)
9. Polanyi, L., van den Berg, M.: Logical Structure and Discourse Anaphora Resolution. Pro-
ceedings of ACL99 Workshop on The Relation of Discourse/Dialogue Structure and Ref-
erence. (1999) 10-117
10. Reiter E., Sripada, S.: Learning the Meaning and Usage of Time Phrases from a parallel
Text-Data Corpus. Proceedings of HLT-NAACL2003 Workshop on Learning Word
Meaning from Non-Linguistic Data. (2003) 78-85
11. Siegel, M., Bender, E. M.: Efficient Deep Processing of Japanese. Proceedings of the 3rd
Workshop on Asian Language Resources and International Standardization. (2002)
12. Torii, M., Kamboj, S., Vijay-Shanker, K.: An investigation of Various Information
Sources for Classifying Biological Names. Proceedings of ACL2003 Workshop on Natu-
ral Language Processing in Biomedicine. (2003) 113-120
13. Wang, H., Yu, S.: The Semantic Knowledge-base of Contemporary Chinese and its Aplli-
cation in WSD. Proceedings of the Second SIGHAN Workshop on Chinese Language
Processing. (2003) 112-118
14. Zhou, G., and Su, J. Named Entity Recognition using an HMM-based Chunk Tagger. Pro-
ceedings of ACL2002. (2002) 473-480
RFID Tag Based Library Marketing for Improving
Patron Services
Toshiro Minami
Abstract. In this paper, we deal with a method of utilizing RFID tags attached on
books and extract tips that are useful for improving library services to their pa-
trons. RFID is an AIDC (Automatic Identification and Data Capture) technology,
with which we can automatically collect data how library materials are used and
how often. By analyzing such data we are able to acquire knowledge that helps
the librarians with better performing their jobs such as which books to collect,
how to help their patrons, and so on. We will call such method “in-the-library
marketing,” because the data deal with how library materials are used by patrons
in the library. It is more effective if we also add the data captured from out-of-the-
library and integrate them for whole “library marketing.” Furthermore, we also il-
lustrate architecture for protecting the patron-related privacy data from leakage,
which is another important issue for library marketing.
1 Introduction
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 51 – 63, 2006.
© Springer-Verlag Berlin Heidelberg 2006
52 T. Minami
browse the original contents at home. Such digital materials are provided on the Web
based systems and thus it is easy to automatically collect data of which materials are
used and which time they are used. By analyzing these data and acquire knowledge
what services are more beneficial than others for patrons, librarians are able to get
good tips for improved patron services.
For the second issue, the number of libraries that have installed the RFID (Radio
Frequency Identification) tag system [3], which represents ubiquitous technology, is
increasing rapidly in these couple of years. In Japan, for example, public libraries are
more aggressive than other kinds of libraries. One of the reasons might be that many
town libraries have been located in community centers and many towns have been
wishing to construct library buildings of its own. It is a good chance for them to in-
stall the RFID tag system for efficiency of jobs and improvement of security and
patron services.
Considering such changes, it is easy to expect that most of the library materials
will be digitized and physical materials, e.g. ordinary books and magazines, will be
used supplementary in the far future. Thus we are in the transitional stage from librar-
ies of physical materials to libraries of digital materials. During this transitional pe-
riod, they are hybrid libraries in which physical and digital materials are coexisting,
and the ratio of digital materials is increasing gradually.
One of the most important things we have to do now is to establish a hybrid library
system so that libraries can deal with both physical and digital materials in a uniform
way and the transition goes seamlessly.
From this point of view, RFID technology is very appropriate for libraries to intro-
duce. RFID is one of the technologies that are called AIDC (Automatic Identification
and Data Capture). The AIDC technology gives two big advantages to libraries. First
it provides a means of better and more efficient method of managing physical materi-
als. Secondly it provides a means of automatically collect digital data how such
physical materials are used. By utilizing such technologies we can easily collect data
about which materials are used, when they are used, by whom they are used, and so
on. The digitization in the first issue and the digital data collected with AIDC tech-
nology provides us with digital data, which are easy to be integrated and thus it is
easy to construct a big and comprehensive database (DB) by collecting such whole
data. In this way the data about physical materials and digital materials can be treated
in a uniform way. Libraries having RFID tag system is also called “u-library (i.e.
ubiquitous library)” [5, 8].
Once we have the integrated DB, we are able to extract information and acquire
knowledge by applying some datamining (DM) techniques and analyze the data. In
the current, i.e. traditional, libraries, the circulation data are virtually the only digital
data that could be used for datamining. Thus adding up the new data collected from
the digitized services and the one from RFID and acquiring useful knowledge for
improving patron services are the new and challenging application field for knowl-
edge acquisition(KA) researchers. Such knowledge is supposed to be used also for
revising the ways of services and starting new library services that will be convenient
for the patrons of the library. In this paper, we call such new method “in-the-library
marketing” [6, 9].
RFID Tag Based Library Marketing for Improving Patron Services 53
Personalized patron service is very important in the next generation library ser-
vices. We will call it “My Library” service. In the top page of My Library, patrons
login to the service by type their library IDs and passwords. They can get their per-
sonal information: for example, which materials they are borrowing and when is the
due date of them. They may get some recommended book list among the materials
that are just cataloged and on the loaning service.
In order to provide such personalized services, the library needs the profile infor-
mation of the patrons. The more appropriate information they have it is possible to
provide more accurate and more sophisticated services to them. In this point of view,
AIDC is again the very important key technology for libraries.
The AIDC technology mostly used in the libraries so far is “barcode.” Comparing
to the barcode technology, the RFID technology has advantages in a couple of as-
pects; it is faster to identify the IDs, it can read the data of tags that are located in the
invisible places from the readers, e.g. tags attached inside of the books, and recognize
multiple IDs, in one action. In this respect, RFID is more appropriate to capture data
automatically about change of the status of materials and patrons.
This paper consists of five sections. In the next section, i.e. Section 2, we will
briefly explain what the RFID tag system is like and how it is used in the libraries. In
Section 3, we will discuss in what way the RFID tags are used for the in-the-library
marketing, followed by the security issue in Section 4. Finally in Section 5, we con-
clude the discussions in this paper.
Throughout this paper, we mainly deal with how to automatically, i.e. easily, col-
lect data that are appropriate for acquiring knowledge that are useful for improving
patron services of libraries. This is the very first step of knowledge acquisition system
for library marketing.
In this section we will have a brief view what RFID system is all about and we put
special focus on how it is currently used in library applications. First of all we de-
scribe what the RFID tag technology is like in Figure 1. The RFID system consists of
two components; tags and reader/writers (R/Ws). Tags can communicate with the
reader/writers which are located in near distance of the tags.
In Figure 1 the RFID tag at the right-hand side consists of an IC chip and an an-
tenna. It has no batteries and thus cannot run standalone. At the left-hand side is an
R/W, which provides energy to the tag. Then the tag gets energy from the R/W with
electro-magnetic induction via the antenna. It waits until sufficient energy is charged.
When it is ready, it communicates with R/W and exchange data such as its ID and
status data by making use of the same antenna. At the backend of the R/W are appli-
cations such as databases.
The frequencies used in RFID systems range from about 100kHz up to GHz bands,
which is in the ISM (Industrial, Scientific and Medical) bands [3]. The most popularly
used frequency among them is 13.56MHz. It is mostly appropriate for applications
with medium read distance, i.e. from about 1cm to 1m. Other frequencies such as
UHF band and microwave band are also under evaluation. However, at the moment,
13.56MHz is considered to be the most appropriate one for the library application.
The tags described so far are called “passive tags” because they have no batter-
ies and need external energy as is shown in Figure 1. There is another type of tags,
which are called “active tags” because they are equipped with batteries and thus
they can work without external energy supply. They will emit radio wave off the
tag autonomously and the R/Ws receive the radio wave and get the data from the
tag. The most important advantage of active tags is that the data transmission dis-
tance is much longer, for example about 10m to 20m, then passive tags and thus
the security gate becomes more reliable. On the other hand, active tags have disad-
vantages such as they are thicker, heavier, much more expensive and shorter life
span, probably up to a couple of years, than passive ones. However these problems
might be overcome in the future and such tags may be widely used together with
passive ones in libraries.
Figure 2 is an example that shows how RFID tag is attached on a book (in Chikushi
Branch Library of Kyushu University Library [4], Japan). The tag is formed as a label
on which the library name is marked together with the university logo. The material ID
is also marked in barcode on the label. The barcode is supposed to be used when this
material is carried to another library in the ILL (Inter-Library Loan), i.e. for interopera-
bility, and when the tag has going bad and becomes broken, i.e. for insurance.
The tag is attached on the first leaf of the book next to the cover in this case. It is
safer to be damaged than attaching on the outside of the cover. However it is more
laborious to read its ID from its barcode label. You open the cover first and read the
ID. On the other hand, when we read the ID by the RFID tag we just put the book
near the reader without opening the cover.
As is seen in Figure 2, the tag is attached to very close place to the spine of the
book. This is because it is more sensitive than attaching it at other place when we
make an inventory. We use portable readers and scan the books as they are stored at
the bookshelves. The reader is less powerful than the normal desktop type reader and
security gate. Thus it is more preferable to arrange the tag close to the reader as possi-
ble, i.e. close to the spine.
Comparing to the barcode system which is mostly used now, RFID tag system has
an advantage that it is much easier to position materials. As a result self checkout
machine is easier to use so that it is good for children and elderly patrons. Figure 3 is
an example. A patron is going to put a couple of books on the designated area. Then
the machine will display the book IDs, the process ends when the patron pushes the
OK button on the touch screen. The list of borrowed books will be printed at the right-
hand side printer.
This is another type of advantage of RFID to barcode. It is not only more efficient
but also more sophisticated and has much easier user-interface. This is a very impor-
tant point. So far the dominating reason for the libraries whey they introduce the
RFID tag system is that it is more efficient; i.e. it is faster to proceed circulation, it is
supposed to have less running cost, and thus the number of librarians needed will be
smaller, etc.
Here the most important motivation of introducing RFID tag system is efficiency.
We will call it the step 1 of library automation. What will come next for the step 2? It
should be effectiveness. By utilizing the RFID tag system we can create new methods
not only for efficient but also for giving better, more sophisticated, more advanced
services so that customer satisfaction increases. This is what we would like to pursue
in library automation.
56 T. Minami
Figure 4 shows another example. It is a self book return machine (in Baulkham
Hills Library [2], Australia). The patrons are supposed to return their borrowing
books by using this machine. Inside of the opening is a lid that is locked normally.
You cannot put books inside of the machine. You are supposed to hold the book for a
while as you return a book in the opening. Then the R/W installed near the lid reads
the ID of the book and check if it an appropriate book for returning or not. The lock is
released when the machine recognizes its appropriateness and you can drop the book
into the box of the return machine. This is another good example for using RFID
technology. It will be very difficult to make such a machine that discriminate the
appropriate books from inappropriate ones if you use barcode for recognizing the
book ID.
In a Web page of Baulkham Hills Shire Council, they say “This technology gives
staff more opportunity to move from behind the desk and focus on value-added, pro-
active customer service,” which is the most important objective of the step 2 of library
automation.
An intelligent shelf is a bookshelf which has R/Ws in it so that it can read which
books are stored in which shelf. Note that a similar shelf used in retail store is often
called smart shelf. Figure 5 is an intelligent bookshelf, in which four antennas are
placed like book separators in the bookshelf. The controller activates the antennas one
by one and recognizes the book IDs in the active shelf. It also changes the active book-
shelves themselves one by one so that one controller can deal with tens of antennas.
Currently the bookshelf R/Ws cannot detect in which part the book is located on a
shelf. However if we make this R/W system more sophisticated so that we are able to
locate more accurately and eventually to locate exactly where and in what order the
books are arranged on a shelf.
By using such bookshelves the library system can detect whenever a book is taken
out of the shelf and whenever it is returned on a shelf. For example the library system
can make a list of books that were returned on wrong shelves. By using this list li-
brarians are able to relocate such books to their right positions.
Also such data can be used to rank the books according to the frequencies of taking
out and returning, which indicate how frequently the books are used in the library.
This will give a good tip to librarians when they evaluate their book collection policy.
Currently the intelligent shelf is very expensive though. One example price is one
million yen, or about ten thousand US dollars, for one line of bookshelves. It is far too
expensive to replace all the bookshelves currently used in libraries. However it is
worth considering if we first replace just one or a couple of bookshelves with intelli-
gent shelves and increase the number gradually.
For such purpose one good candidate is the bookshelves for newly registered books
and/or for those just returned by a patron. Such books attract patrons’ interest and thus
they will be used in high frequencies. By analyzing such data, librarians will be
helped by the extracted information with choosing new books to be purchased.
Another candidate is, specifically in university libraries, for the books that are des-
ignated as subtexts by teachers. These books usually appear in the syllabuses. It is a
great benefit for students to read such textbooks in the library.
58 T. Minami
(a) A Patron Reading a Book on the Table (b) R/Ws under the Table
If we use the intelligent shelves for such books, the library can collect the detailed
data how these books are used; e.g. for each book when it was taken off the shelf and
when it was returned, and maybe who did it.
By collecting and analyzing these data, the library might be able to decide how
many volumes of a title to buy according to the data. If a book has little or no usage
history, the library can let the teacher who recommended this book know this fact
Then. he/she may encourage the students of the class so that they use this textbook
more.
Book trucks might be a good choice to use in some situations. In some libraries the
books taken out from the ordinary bookshelves are supposed to return on book trucks.
If we set shelf readers to the book trucks, the system can get the data when the books
are returned to the book trucks.
An intelligent browsing table is a table in a browsing room of the library, which
has R/W(s) in it. Figure 6 is an example browsing table experimented in AIREF Li-
brary in Fukuoka City, Japan [1]. In this figure a patron is reading a book on the table.
He has a couple of books around him (a) and two RFID readers (b) detect them and
send the data to their server.
By analyzing the data from the intelligent browsing table(s), librarians are able to
obtain information which books are read, how long, how often and others. Such in-
formation is useful for shelf arrangement and book collection. If the table readers can
also collect patron IDs, they can get information who reads what books. By analyzing
these data we can get which and which books are often read together by such and
such patrons. This information is useful for book recommendation service to patrons
by use of the collaborative filtering technology [6, 12], which has been well-known in
agent researcher’s communities.
3.2 Analysis
Automatically or manually collected data form a big database. It consists of, for ex-
ample, catalog data for materials, circulation records, data by intelligent shelves, and
others. We are able to extract statistical data not only from one type of data but also
from some types of data by combining them.
RFID Tag Based Library Marketing for Improving Patron Services 59
From circulation records, for example, we are able to know how many books a pa-
tron borrowed so far, per year, per month, and so on. We are also able to know how
many books were borrowed in each day of a week, and in which time zone in a day.
By combining it with patron’s personal data, we are able to know, for example, the
members of a department of a university borrows what sort of materials and how
many, and differences of borrowed materials between two departments.
Furthermore, we will be able to acquire knowledge something like “patrons who
come to the library in the morning borrows more materials than those who come in
the afternoon” by applying a DM algorithm.
Such data and knowledge acquired by analyzing the databases will become good
tips for libraries to improve their patron services.
Figure 7 illustrates the cyclic structure of database and services. In the left part are da-
tabases in a library. Catalog database is constructed partly manually and partly automati-
cally. Circulation database will be constructed automatically by using an AIDC technol-
ogy. Personal data of patrons will be obtained when the library issues patron cards.
In the right part of the figure are library services. The OPAC service uses the mate-
rials’ catalog data. In order to provide the “My Library” service, the system will use
some patron data, circulation data, and others. The catalog data and other information
will be well used for the reference service. The big arrow from databases and services
indicate such used-use relationship.
On the other hand, we can collect some log data of services. From the OPAC ser-
vice, we are able to have the data of which keywords were used, which library materi-
als were chosen for getting detail data, and so on. From the My Library service, who
login this service, when they use, how much time they use, which specific informa-
tion they access, and so on.
These data themselves form another database, which is indicated the “Service Log”
in Figure 7. We can use this database also for improving various services.
Take, for example, the OPAC service. Suppose a patron is trying to find appropri-
ate keywords. When he or she types a keyword, it may be more convenient for the
patron if the OPAC system shows some keywords relating to the given one. This is a
Databases Services
Reference
Service Log
Tag
keyword recommendation [6] function. The keyword log data from the OPAC system
itself provide its basis data. The system calculates the distance between two keywords
in advance from this data. Then the recommended keywords are chosen among the
keywords that are close to the keyword given by the patron.
If the My Library service provides a function of attaching comments to materials,this
data can be used for OPAC service as well. Suppose we give a keyword and get a list of
related library materials. The system can put a function of displaying what comments
are attached on each material. The comments will help the patron with deciding if he or
she reads it or not, reads in the library or borrows it when he/she decides to read
3.3 Improvement
As was mentioned in previous sections, RFID technology is very suitable to in-the-
library marketing. By collecting and analyzing the data we can acquire useful knowl-
edge that helps with providing better services to patrons, managing the library more
efficiently and more effectively.
One important issue here is how to feedback the acquired knowledge to the patrons.
An idea for improving the patron interface is the “virtual bookshelf [9, 13].” Figure 8 is
an example screen. This is supposed to be a result of an OPAC system, in which the list
of books is displayed with images of book spines. This way of display is much easier to
intuitively recognize what books are in the list than that of display in text form. This is
because we recognize and memorize books not only with title and/or authors, but also
with the design of the book; color, font, arrangement, and so on.
One method of attaching extra information to each of the book is to put a link to the
image of the spine of the book. If we click on a book spine, the information relating to
the book will be displayed in a popup window. The information includes catalog data
such as title, author, year of issue, etc. together with extra information such as com-
ments by patrons and/or librarians.
Virtual bookshelf is also well used for personalized library services. Patrons may keep
some number of his or her own virtual bookshelves in the library Web site. They can
arrange the virtual books on whichever bookshelves they want. They are also able to rear-
range the books whenever they want as if the bookshelves are their own and they are
always at their study room. They can also put comments to the books from their personal
views. The library system will collect such data and feedback the statistic results and other
data including the evaluation data and comments to the patrons. One possible service is to
recommend one or some books to the patrons that might be useful for the patrons to put in
the patron’s bookshelf.
4 Security Issue
Security issue is also important for RFID tag system [15]. In this section we discuss two
types of security issues; how to protect library materials from theft and how to protect
patron-related private data from leakage.
Take the first one. RFID tag system is used as a replacement of magnetic tag system,
which is mostly used in university libraries in Japan. One reason of introducing the
RFID tag system is to replace the magnetic tag for security and barcode book identifica-
tion code with one RFID tag. This reduction of book processing cost somehow contrib-
utes the reduction of the cost of installing the RFID tag systems and the cost of tags.
This aspect belongs to the step 1 of library automation with RFID tag system.
For the aspect of the step2, RFID tag system has an advantage of having much detail
data about materials and patrons. By utilizing this advantage the system analyzes the data
and extracts information such as what types of materials are stolen than other types, what
types of attributes correlate to possibility of being theft, and so on.
Take the second issue; i.e. protection from leakage of private data. Figure 9 illustrates
one possible solution to this issue. The basic idea of this system is to put a separated pri-
vate data management server (PDMS) and control the flow of data to and from the server.
Raw Data
Analysis Programs
The data of the library database that are needed in analyzing are stored in the
IDMS. Only the manager operator of this machine can access to this original data.
The analysis programs are invoked either automatically invoked programs or by an
ordinary operator of this system. An ordinary operator is able to use only designated
programs. The processed results, that are obtained by such analysis programs are
accessible by the ordinary operators and thus they are allowed to be copied to
outside of the machine. However, ordinary operators are not allowed to access to
the original data in order to protect these data from leakage.
This mechanism is conceptually similar to firewall system for network traffic
data. The major purpose of out mechanism is to protect data from leakage by
operational errors.
5 Concluding Remarks
The background situation of this research lies that materials that are dealt with librar-
ies are changing from physical ones such as books made of paper to digital ones. A
big difference between these two materials is that digital materials are easy to deliver
via network.
So the patrons do not need to visit a library to get materials. However, considering
that we have a huge collection of books already, the library materials will be a mixed up
of these two. So we expect the libraries last as hybrid libraries for a long time.
In this paper, we propose a model for hybrid libraries to transit seamlessly from the
library where most materials are made of paper, i.e. books and journals, to the electric
library, where most materials are provided as digital data. The key idea of this model is
the good use of RFID tag system.
RFID is a representative technology of automatic identification and data capture
(AIDC) technologies. With this technology, we have advantages of faster and multi-
ple ID recognition, easy to use operational interface, etc., comparing to the barcode
system, which is the currently dominating AIDC technology.
In the first step of introducing RFID for library automation, the major aim is to
get efficiency and cost cutting. For example in a library of University of Con-
necticut, US [14], the number of counters can be reduced to half after installing
RFID tag system. Even though the initial cost needed for RFID equipments is
very high, the running cost is cheaper than that of barcode. However we would
like to put strong stress on the importance of the second step for RFID in this
paper. In the second step, we get extra data by utilizing RFID tag system and use
the data for acquiring knowledge that is useful for improving library services (i.e.
one of the library marketing [7] method).
The key equipment for the second step is RFID readers attached to the book-
shelves, book trucks, browsing tables, and so on. By collecting and analyzing such
data, we can extract statistical information and useful tips.
In order to feedback the knowledge to the user, we recommend the use of virtual
shelf system. It is more intuitive and easy to recognize the books in a list. We have
also illustrated that this system is a good platform to provide extra information to the
patrons.
RFID Tag Based Library Marketing for Improving Patron Services 63
Acknowledgments
I greatly acknowledge my co-researchers Prof Daisuke Ikeda of Kyushu University,
Prof Takuya Kida of Hokkaido University, and Prof Kiyotaka Fujisaki for their dis-
cussions on digital libraries and RFID technologies. This research was partially sup-
ported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for
Scientific Research (B), 16300078, 2005.
References
1. AIREF Library: http://www.kenkou-fukuoka.or.jp/airef/tosyokan3.htm (in Japanese)
2. Baulkham Hills Shire Library Services: http://www.baulkhamhills.nsw.gov.au/library/
3. Finkenzeller, K.: RFID Handbook (Second Edition). John Wiley & Sons (2003)
4. Kyushu University Library: http://www.lib.kyushu-u.ac.jp/
5. Lee, Eung-Bong: Digital Library & Ubiquitous Library. Science and Technology Informa-
tion Management Association Academic Seminar (V) (2004) (in Korean)
6. Oda, Mitsuru, Minami, Toshiro: From Information Search towards Knowledge and Skill
Acquisition with SASS, Pacific Rim Knowledge Acquisition Workshop (PKAW2000),
(2000)
7. Ohio Library Council: http://www.olc.org/marketing/
8. Minami, Toshiro: Needs and Benefits of Massively Multi Book Agent Systems for u-
Libraries, T. Ishida, L. Gasser, and H. Nakashima(eds.), MMAS 2004, LNAI3446,
Springer, (2005) 239-253
9. Minami, Toshiro: On-the-site Library Marketing for Patron Oriented Services. Bulletin of
Kyushu Institute of Information Sciences, Vol.8 No.1 (2006) (in Japanese)
10. Ranganathan, S.R.: The Five Laws of Library Science, Bombay Asia Publishing House
(1963)
11. Ramparany, F., Boissier, O.: Smart Devices Embedding Multi-Agent Technologies for a
Pro-active World, Proc. Uniquitous Computing Workshop (2002)
12. Resnick, P. and Varian, H. R. (Guest Eds.): Recommender Systems. Communications of
ACM, Vol. 40 No. 3 (1997) 56-89
13. Sugimoto, Shigeo et.al: Enhancing usability of network-based library information system -
experimental studies of a user interface for OPAC and of a collaboration tool for library
services. Proceedings of Digital Libraries '95 (1995) .115-122
14. University of Connecticut Libraries: http://spirit.lib.uconn.edu/
15. Weis, S.A., Sarma, S.E., Rivest, R.L., Engels, D.W.: Security and Privacy Aspects of
Low-Cost Radio Frequency Identification Systems. Proc. First International Conference on
Security in Pervasive Computing. Lecture Notes in Computer Science, Vol. 2802.
Springer-Verlag (2003) 201-212
Extracting Discriminative Patterns from Graph
Structured Data Using Constrained Search
1 Introduction
Over the last decade, there has been much research work on data mining
which intend to find useful and interesting knowledge from massive data that
are electronically available. A number of studies have been made in recent
years especially on mining frequent patterns from graph-structured data, or
simply graph mining, because of the high expressive power of graph
representation [1,13,6,12,4,5].
Chunkingless Graph Based Induction (Cl-GBI) [8] is one of the latest algo-
rithms in graph mining and an extension of Graph Based Induction (GBI) [13]
that can extract typical patterns from graph-structured data by stepwise pair
expansion, i.e., by recursively chunking two adjoining nodes. Similarly to GBI
and its another extension, Beam-wise GBI(B-GBI) [6], Cl-GBI adopts the step-
wise pair expansion principle, but never chunks adjoining nodes and contracts
the graph. Instead, Cl-GBI regards a pair of adjoining nodes as a pseudo-node
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 64–74, 2006.
c Springer-Verlag Berlin Heidelberg 2006
Extracting Discriminative Patterns from Graph Structured Data 65
and assigns a new label to it. This operation is called pseudo-chunking and can
fully solve the reported problems caused by chunking, i.e., ambiguity in selecting
nodes to chunk and incompleteness of the search. This is because every node is
available to make a new pseudo-node at any time in Cl-GBI. However Cl-GBI
requires more time and space complexities in exchange of gaining the ability of
extracting overlapping patterns. Thus, it happens that Cl-GBI cannot extract
patterns that need be large enough to describe characteristics of data within
a limited time and a given computational resource. In such a case, extracted
patterns may not be so much of interest for domain experts.
To improve the search efficiency, in this paper, we propose a method of guiding
the search of Cl-GBI using domain knowledge or interests of domain experts. The
basic idea is adopting patterns representing knowledge or interests of domain
experts as constraints on the search, in order to effectively restrict the search
space and extract more discriminative or interesting patterns than those which
can be extracted by the current Cl-GBI. For that purpose, we use two types of
patterns as the constraints: one is the pattern that should be included in the
extracted ones, and the other is the pattern that should not be included in them.
These patterns allow us to specify patterns of interest and patterns trivial for
domain experts, respectively. We also experimentally show the effectiveness of
the proposed search method by applying the constrained Cl-GBI to the hepatitis
dataset which is a real world dataset.
In this paper, we deal with only connected labeled graphs, and use information
gain [10] as the discriminativity criterion. In what follows, “a pair” denotes a
pair of adjoining nodes in a graph.
Stepwise pair expansion is an essential operation in GBI and its variants, which
recursively generates new nodes from pairs of two adjoining nodes and links
between them. In GBI, a pair is selected according to a certain criterion based
on frequency, and all of its occurrences in graphs are replaced with a node
having a newly assigned label. Namely each graph is rewritten each time a pair
is chunked, and never restored in any subsequent chunking1 . On one hand, this
chunking mechanism is suitable for extraction of patterns from either a very large
single graph or a set of graphs because extracted patterns can rapidly grow. On
the other hand, it involves ambiguity in selecting nodes to chunk, which causes a
crucial problem, i.e., possibility of overlooking some overlapping subgraphs due
to inappropriate chunking order. Beam search adopted by B-GBI can alleviate
this problem by chunking the b (beam width) most frequent pairs and copying
each graph into respective states, but not completely solve it because chunking
process is still involved.
1
Note that this does not mean that the link information of the original graphs is
lost. It is always possible to restore how each node is connected in the extracted
subgraphs.
66 K. Takabayashi et al.
pseudo node
4 10 4
1 1 1 1
10 10
pseudo
3 3 11 3 11 3
8 node 8
2 2 2 2
7 11
7 7 7
5 1 1
5 5 10
5
3 3
11
4 2 4 2
6 9 6 9
Process of pseudo-chunking
1 1 1
10
10
3 3 3
11
2 2 2
Level: 0 1 2
In contrast to GBI and B-GBI, Cl-GBI does not chunk a selected pair, but
regards it as a pseudo-node and assigns a new label to it. Thus, graphs are not
“compressed” nor copied into respective states. Figure 1 illustrates examples of
pseudo-chunking in Cl-GBI, in which a typical pattern consisting of nodes 1, 2,
and 3 is extracted from the input graph. Cl-GBI first finds the pair 1 → 3 based
on its frequency, and pseudo-chunks it, i.e., registers it as the pseudo-node 10,
but does not rewrite the graph. Then, in the next iteration, it finds the pair
2 → 10, and pseudo-chunks and registers it as the pseudo-node 11. As a result,
the typical pattern is extracted. In the rest of the paper, we refer to each iteration
in Cl-GBI as “level”.
The algorithm of Cl-GBI is shown in Fig.2. The search of Cl-GBI is controlled
by the following parameters: a beam width b, the maximal number of levels of
pseudo-chunking N , and a frequency threshold θ. In other words, at each level,
the b most frequent pairs are selected from a set of pairs whose frequencies are
not less than θ, and are pseudo-chunked.
INpattern EXpattern
D f E A f B
b a b g d f
E D B C
e A
D E B A f
B
f
C D h f g
e c d f
b b
J C B e e C
E a
D A A
F
extract discard extract extract discard
(a) ( b )
terns, but also patterns including at least one proper subgraph of the INpatterns
should be extracted in order not to prevent patterns satisfying the imposed con-
straints from being generated in the succeeding steps based on the stepwise pair
expansion principle. The case is illustrated at the bottom right in Fig.3 (a). We
call such a pattern including only proper subgraphs of the given INpatterns a
neighborhood pattern. In other words, an INpattern does not necessarily have to
be identical to a pattern representing domain knowledge or interests of domain
experts, but need merely include at least one of its proper subgraphs. Namely,
Constraint 1 is useful to specify patterns of interest for domain experts and to
aggressively search around them, while Constraint 2 is useful to avoid extracting
trivial or boring patterns for domain experts.
In addition, in case of Constraint 1, we can discard a pair if it does not
include any node/link labels appearing in the given INpatterns as shown in
Fig.3 (a). This is because such a pair, or pattern can never grow to a pattern that
includes at least one of the given INpatterns. Even if in fact the discarded pattern
is a subgraph of a pattern P satisfying all constraints and a given evaluation
criterion such as the minimal frequency, P can be constructed from another
pattern including at least one node/link label appearing in the given INpatterns.
In other words, enumerating only the pairs with at least one node/link label in
the given INpatterns as candidates to be pseudo-chunked allows us to effectively
restrict the search space. However, it is noted that a label with high frequency
does not effectively work as a constraint because it may appear also in many
pairs which the user intends to exclude from the search space. Thus, if a set of
node (link) labels contains such frequent ones, we do not use the set to restrict
pairs in the enumeration process.
where L(y) is a set of labels appearing in y, and f (x, Lk ) is the number of occur-
rences of the label Lk ∈ L(y) in x. Note that if x is identical to y, Tnum (x, y) must
be equal to Tnum (y, y). Similarly, Tnum (x, y) must be greater than Tnum (y, y) if
y is a subgraph of x. Consequently, we can skip subgraph isomorphism check-
ing for the pair of an enumerated pattern Pi and a constraint pattern Tj if
Tnum (Pi , Tj ) < Tnum (Tj , Tj ) because Pi never includes Tj .
Furthermore, we can prune more subgraph isomorphism checking even if
Tnum (Pi , Tj ) ≥ Tnum (Tj , Tj ) because it does not guarantee that Pi necessar-
ily includes Tj . In order for Pi to include Tj , for every label appearing in Tj ,
the number of its occurrences in Pi has to be greater than or equal to that in
Tj . Namely, we can skip subgraph isomorphism checking for Pi if Pi does not
satisfy this condition. To check this condition, we define the following boolean
value Pinf o for two patterns x and y.
Pinf o (x, y) = p(x, y, Lk ), (2)
Lk ∈L(y)
where
true if f (x, Lk ) ≥ f (y, Lk ),
p(x, y, Lk ) =
f alse otherwise.
ExtPair(D, T , L, Lv , M )
Input: a database D, a set of constraint patterns T , the current level Lv ,
a set of extracted pairs L (initially empty),
the constraint mode M (either “INpattern” or “EXpattern”);
Output: a set of extracted pairs L with newly extracted pairs;
begin
if Lv = 1 then
if M = “INpattern” then
Enumerate pairs in D, which consist of nodes or links
appearing in T , and store them in E;
else
Enumerate all the pairs in D and store them in E;
else
Enumerate pairs, which consist of one or both
pseudo-nodes in L, and store them in E;
for each Pi ∈ E begin
if Pi is marked then
L := L ∪ {Pi }; next;
register := 1;
for each Tj ∈ T begin
if T num(Pi , Tj ) ≥ T num(Tj , Tj ) then
if Pinf o (Pi , Tj ) = true then
if M = “INpattern” then
if P D(Pi , Tj ) = true then mark Pi ;
else
if P D(Pi , Tj ) = true then
discard Pi ; register := 0; break;
end
if register = 1 then L := L ∪ {Pi };
end
return L;
end
4 Experimental Evaluation
4.1 Experimental Settings
To evaluate the proposed method, we implemented Cl-GBI with the algorithm
shown in Fig.4 on PC (CPU: Pentium 4 3.2GHz, Memory: 4GB, OS: Fedora
Core release 3) in C++, and applied this constrained Cl-GBI to the hepatitis
dataset. The current system has a limitation that either INpattern constraints
or EXpattern constraints can be imposed at a time. In this experiment, we used
two classes, Response and Non-Response, in the dataset, denoted by R and N ,
respectively. R consists of patients to whom the interferon therapy was effective,
while N consists of those to whom it was not effective. We used 24 inspection
Extracting Discriminative Patterns from Graph Structured Data 71
class R N
number of graphs 38 56
average number of nodes in a graph 104 112
maximal number of nodes in a graph 145 145
minimum number of nodes in a graph 24 20
total number of nodes 3,944 6,296
kinds of node labels 12
average number of links in a graph 108 117
maximal number of links in a graph 154 154
minimum number of links in a graph 23 19
total number of links 4,090 6,577
kinds of link labels 30
H VH UH N N UH H VH UH
H 5
GPT GPT GPT I-BIL ALB GPT GPT GPT
GPT GPT GPT_SD
d d d d d d d
d d d
PLT PLT PLT I-BIL ALB GPT I-BIL ALB CHE CHE CHE CHE HGB
L L L N N VH N N N N N H N
(a)No.1 (b)No.2 (c)No.3 (d)No.4
items as attributes, and converted the records of each patient into a graph in
the same way as [3]. The statistics on the size of resulting graphs are shown in
Table. 1.
In this experiment, we used 4 sets of INpatterns shown in Fig.5, in which (a)
to (c) are patterns reported in [7] and represent typical examination results for
patients belonging to R, while (d) is the pattern with the highest information
gain, or the most discriminative pattern among ones extracted by the current
Cl-GBI. In the following, we refer to the pattern which is the most discriminative
one among extracted patterns as the MDpattern. The node with the label “d”
in Fig.5 is a dummy node representing a certain point of time. For example, the
leftmost pattern in Fig.5 (a) means that at a certain point of time, the value of
GPT (glutamic-pyruvic transaminase) is High and the value of PLT (platelet)
is Low. Note that node labels in this dataset such as “d”, “H”, etc. are com-
mon and may appear with large frequency. Thus, we used only the link labels
appearing in the INpatterns as constraints for the pair enumeration as discussed
above. As for the parameters of Cl-GBI, we set them as follows: b = 10, N = 10,
and θ = 0%.
72 K. Takabayashi et al.
VH 5
UH N GPT N N VH GPT_SD
N H
MCHC HCT ALB GPT HGB CHE
GPT
N HGB d CHE H d d
d I-BIL ALB GOT_SD CHE D-BIL ALB
D-BIL PLT N GOT_SD N 3 H N N
N L 3
(a) No.1 (b) No.2 (c) No.3 (d) No.4
We gave each set of patterns shown in Fig.5 as INpatterns to the constrained Cl-
GBI, and observed the computation time, the MDpattern, and its information
gain in each case. The results regarding the computation time and information
gain of the MDpatterns are shown in Table. 2, in which the row named “original”
contains the results by the current Cl-GBI with the same parameter settings.
Namely, the MDpattern shown in Fig.5 (d) corresponds to the MDpattern in the
case of “original”, and its information gain is 0.1139. “L” and “t” in parentheses
denote the level and time[sec] spent to extract the MDpattern, respectively.
The resulting MDpatterns obtained by the constrained Cl-GBI are illustrated in
Fig.6.
First, focusing on the column of “max information gain” in Table 2, it is found
that the MDpatterns extracted by the constrained Cl-GBI are more discrimina-
tive than the MDpattern by the current Cl-GBI in the cases of No.2 and No.4.
In the cases of No.1 and No.3, the values of information gain of the MDpatterns
are compatible with that of the MDpattern extracted by the current Cl-GBI.
In addition, the computation times in all 4 cases using INpatterns are much
less than in the case of the current Cl-GBI. Note that the values in parenthe-
ses in the column “time” of Table 2 are corresponding computation times by
the constrained Cl-GBI without pruning subgraph isomorphism checking. Com-
paring with computation times in the same row, it could be said that checking
if pairs include constraint patterns based on the number of nodes/links could
reduce the computation time significantly. From these results, we can say that
given appropriate constraints, the constrained Cl-GBI could efficiently extract
patterns which are more discriminative than those which are extracted by the
current Cl-GBI.
Next, as shown in Fig.6, except in the case of No.3, the MDpatterns extracted
by the constrained Cl-GBI include one of the given INpatterns completely. In
the case of No.3, the MDpattern includes a subgraph of one of the INpatterns.
Thus, it is expected that the proposed constrained search method may work well
even if it is not sure that given INpatterns are genuinely appropriate ones. This
is because it can extract not only patterns completely including them, but also
the neighborhood patterns.
In addition, note that the INpattern used in the case of No.4 is the MDpattern
obtained by the current Cl-GBI with the same parameter settings, and works as
a good constraint, succeeding in extracting more discriminative patterns. From
this result, it is expected that MDpatterns obtained before may work as good
constraints to guide the search, which would be desirable if no domain knowl-
edge is available to restrict the search space: in such a case, instead of running
the current Cl-GBI only once setting the maximal level L to a large value, re-
peatedly running the constrained Cl-GBI setting L to a smaller value and using
the MDpattern extracted by the previous run as the new INpattern might allow
us to extract patterns that are more discriminative in a less computation time.
Verifying this expectation is one of our future work.
5 Conclusion
References
1. Cook, D. J. and Holder, L. B.: Substructure Discovery Using Minimum Description
Length and Background Knowledge. Artificial Intelligence Research, Vol. 1, pp.
231–255, (1994).
2. Fortin, S.: The Graph Isomorphism Problem. Technical Report TR96-20, Depart-
ment of Computer Science, University of Alberta, (1996).
3. Geamsakul, W., Yoshida, T., Ohara, K., Motoda, H., Yokoi, H., and Takabayashi,
K.: Constructing a Decision Tree for Graph-Structured Data and its Applications.
Fundamenta Informaticae Vol. 66, No.1-2, pp. 131–160, (2005).
4. Inokuchi, A., Washio, T., and Motoda, H.: Complete Mining of Frequent Patterns
from Graphs: Mining Graph Data. Machine Learning, Vol. 50, No. 3, pp. 321–354,
(2003).
5. Kuramochi, M. and Karypis, G.: An Efficient Algorithm for Discovering Frequent
Subgraphs. IEEE Trans. Knowledge and Data Engineering, Vol. 16, No. 9, pp.
1038–1051, (2004).
6. Matsuda, T., Motoda, H., Yoshida, T., and Washio, T.: Mining Patterns from
Structured Data by Beam-wise Graph-Based Induction. Proc. of DS 2002, pp.
422–429, (2002).
7. Motoyama, S., Ichise, R., and Numao, M.: Knowledge Discovery from Inconstant
Time Series Data. JSAI Technical Report, SIG-KBS-A405, pp. 27–32, in Japanese,
(2005).
8. Nguyen, P. C., Ohara, K., Motoda, H., and Washio, T.: Cl-GBI: A Novel Approach
for Extracting Typical Patterns from Graph-Structured Data. Proc. of PAKDD
2005, pp. 639–649, (2005).
9. Nguyen, P. C., Ohara, K., Mogi, A., Motoda, H., and Washio, T.: Constructing De-
cision Trees for Graph-Structured Data by Chunkingless Graph-Based Induction.
Proc. of PAKDD 2006, pp. 390–399, (2006).
10. Quinlan, J. R.: Induction of decision trees. Machine Learning, Vol. 1, pp. 81–106,
(1986).
11. Sato, Y., Hatazawa, M., Ohsaki, M., Yokoi, H., and Yamaguchi, T.: A Rule Discov-
ery Support System in Chronic Hepatitis Datasets. First International Conference
on Global Research and Education (Inter Academia 2002), pp. 140–143, (2002).
12. Yan, X. and Han, J.: gSpan: Graph-Based Structure Pattern Mining. Proc. of the
2nd IEEE International Conference on Data Mining (ICDM 2002), pp. 721–724,
(2002).
13. Yoshida, K. and Motoda, H.: CLIP: Concept Learning from Inference Patterns.
Artificial Intelligence, Vol. 75, No. 1, pp. 63–92, (1995).
Evaluating Learning Algorithms with
Meta-learning Schemes for a Rule Evaluation
Support Method Based on Objective Indices
1 Introduction
In recent years, with huge data stored on information systems in natural science,
social science and business domains, developing information technologies, people
hope to find out valuable knowledge suited for their purposes. Besides, data
mining techniques have been widely known as a process for utilizing stored data
on database systems, combining different kinds of technologies such as database
technologies, statistical methods and machine learning methods. In particular,
if-then rules are discussed as one of highly usable and readable output of data
mining. However, to large dataset with hundreds attributes including noise, the
process often obtains many thousands of rules, which rarely include valuable
rules for a human expert.
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 75–88, 2006.
c Springer-Verlag Berlin Heidelberg 2006
76 H. Abe et al.
Learning Algorithm
Learning Algorithm The Rule Evaluation Model
Selection Scheme Display with
the predictions
Predicting rule evaluations
A Rule Evaluation Model using the learned rule evaluation Human
model Expert
To support such a rule selection, many efforts have done using objective rule
evaluation indices such as recall, precision, and other interestingness measure-
ments [16,30,33] (we call them “objective indices” later). However, it is also
difficult to estimate a criterion of a human expert with single objective rule
evaluation index, because his/her subjective criterion such as interestingness and
importance for his/her purpose is influenced by the amount of his/her knowledge
and/or a passage of time.
To above issues, we have been developed an adaptive rule evaluation support
method for human experts with rule evaluation models, which predict experts’
criteria based on objective indices, re-using results of evaluations of human ex-
perts. In Section 2, we describe the rule evaluation model construction method
based on objective indices. Since our method needs more accurate rule evalua-
tion model to support a human expert more exactly, we present a performance
comparison of learning algorithms for constructing rule evaluation models in
Section 3. With the results of the comparison, we present the availability of
learning algorithms from a constructive meta-learning system[1] for our rule
evaluation model construction approach.
rules at least once. After obtaining the training data set, its rule evaluation
model is constructed by a learning algorithm. At the prediction phase, a human
expert receives predictions for new rules based on their values of the objective
indices. Since the task of rule evaluation models is a prediction, we need to
choose a learning algorithm with higher accuracy as same as current classification
problems.
Table 1. The objective rule evaluation indices for classification rules used in this
research. P: Probability of the antecedent and/or consequent of a rule. S: Statistical
variable based on P. I: Information of the antecedent and/or consequent of a rule.
N: Number of instances included in the antecedent and/or consequent of a rule. D:
Distance of a rule from the others based on rule attributes.
vector machines (SVM)1 [25], classification via linear regressions (CLR)2 [7], and
OneR [18]. In addition, we have also taken the following selective meta-learning
algorithms: Bagging [5], Boosting [9] and Stacking3 [32].
In this case study, we have taken 244 rules, which are mined from six datasets
about six kinds of diagnostic problems as shown in Table 2. These datasets are
consisted of appearances of meningitis patients as attributes and diagnoses for each
patient as class. Each rule set was mined with each proper rule induction algorithm
composed by a constructive meta-learning system called CAMLET [14]. For each
rule, we labeled three evaluations (I: Interesting, NI: Not-Interesting, NU: Not-
Understandable), according to evaluation comments from a medical expert.
Dataset #Attributes #Class #Mined rules #’I’ rules #’NI’ rules #’NU’ rules
Diag 29 6 53 15 38 0
C Cource 40 12 22 3 18 1
Culture+diag 31 12 57 7 48 2
Diag2 29 2 35 8 27 0
Course 40 2 53 12 38 3
Cult find 29 2 24 3 18 3
TOTAL — — 244 48 187 9
renewing 5 times
data weight
decision tree
generating valudation voting with
generation apportionment of
START and training datasets weighting based END
with entoropy credit
with random split on error rate
+information ratio 20 times
or low target
modifying classifiers
with synthesising
deleting weak rules
classifiers 5 times
Fig. 2. The learning algorithm constructed by CAMLET for the dataset of the menin-
gitis datamining result
Table 3. Accuracies (%), Recalls (%) and Precisions (%) of the nine learning algorithms
Evaluation on the training dataset
Learning
Recall Precision
Algorithms Acc.
I NI NU I NI NU
Leave-One-Out(LOO)
Learning
Recall Precision
Algorithms Acc.
I NI NU I NI NU
%training 100
Accuracy ratio on the Whole Training Dataset
sample 10 20 30 40 50 60 70 80 90 100 95
CAMLET 76.7 78.4 80.8 81.6 81.7 82.6 82.8 84.8 84.6 89.3
Stacking 69.6 77.8 75.3 77.9 72.2 82.2 75.4 83.4 86.5 81.1 90
Boosted J4.8 74.8 77.8 79.6 82.8 83.6 85.5 86.8 88.0 89.7 99.2 85
Bagged J4.8 77.5 79.5 80.5 81.4 81.8 82.1 83.2 83.2 84.1 87.3
J4.8 73.4 74.7 79.8 78.6 72.8 83.2 83.7 84.5 85.7 85.7 80
BPNN 74.8 78.1 80.6 81.1 82.7 83.7 85.3 86.1 87.2 86.9 Bagged J4.8
SMO 78.1 78.6 79.8 79.8 79.8 80.0 79.9 80.2 80.4 81.6
75
Boosted J4.8
Stacking
CLR 76.6 78.5 80.3 80.2 80.3 80.7 80.9 81.4 81.0 82.8 70
CAMLET
OneR 75.2 73.4 77.5 78.0 77.7 77.5 79.0 77.8 78.9 82.4
0 20 40 60 80 100
% training sample
Fig. 3. Accuracies (%) with training sub-samples to the whole training dataset on the
left table. And the chart of achieve rates(%) to the accuracies with the whole training
dataset on the meta-learning algorithms.
As shown in these results, SVM, CLR and bagged J4.8 achieves higher than
95% with only less than 10% of training subset. Looking at the result of learn-
ing algorithm constructed by CAMLET, this algorithm achieves almost as same
Evaluating Learning Algorithms with Meta-learning Schemes 81
Fig. 4. Top 10 of frequencies of indices used in models of each learning algorithm with
10000 bootstrap samples of the meningitis datamining result dataset and executions
performance as bagged J4.8 with smaller training subset. However, it can out-
perform bagged J4.8 with larger training subsets. Although the constructed
algorithm based on boosting, the combination of reinforcement method from
Classifier Systems and the outer loop has been able to overcome a disadvantage
of boosting for smaller training subset.
In this case study, we have taken four datamining results about chronic hepatitis
as shown in the left table of Table 4. These datasets are consisted of patterns for
each laboratory test value about blood and urine of chronic hepatitis patients as
attributes. Firstly, we have done datamining processes to find out relationships
between patterns of attributes and patterns of GPT as class, which is one of
the important test items to grasp conditions of each patient, two times. Second,
we have also done other datamining processes to find out relationships between
patterns of attributes and results of interferon (IFN) therapy two times. For
each rule, we labeled three evaluations (EI: Especially Interesting, I: Interesting,
82 H. Abe et al.
Table 4. Description of datasets of the chronic hepatitis datamining results (left table).
And Overview of constructed learning algorithms by CAMLET to the datasets of the
chronic hepatitis datamining results (right table).
models (Fig. 4). This shows that the medical expert evaluated these rules with
both of correctness and interestingness based on his background knowledge.
On each problem, the variance of indices has been reduced in each second
time datamining process. This indicates that the medical expert evaluated each
second time datamining result with more certain criterion than it of each first
time datamining process.
84 H. Abe et al.
Table 6. Flow diagram to obtain datasets and the datasets of the rule sets learned
from the UCI benchmark datasets
We have also evaluated our rule evaluation model construction method with rule
sets from eight datasets of UCI Machine Learning Repository [15] to investigate
the performances without any human criteria.
We have taken the following eight dataset: anneal, audiology, autos, balance-
scale, breast-cancer, breast-w, colic, and credit-a. With these datasets, we ob-
tained rule sets with bagged PART, which repeatedly executes PART [8] to
bootstrapped training sub-sample datasets.
To these rule sets, we calculated the 39 objective indices as attributes of
each rule. As for the class of these datasets, we set up three class distributions
with multinomial distribution. Table 6 shows us the process flow diagram to
obtain the datasets and the description of datasets with three different class
distributions. The class distribution for “Distribution I” is P = (0.35, 0.3, 0.3)
where pi is the probability for class i. Thus, the number of class i in each instance
Dj become pi Dj . As the same way, the probability vector of “Distribution II” is
P = (0.3, 0.65, 0.05). We have investigated performances of learning algorithms
on these balanced class distribution and unbalanced class distribution.
Distribution I Distribution II
original overall final original overall final
classifier set control structure eval. method classifier set control structure eval. method
Weighted Weighted
anneal C4.5 tree Win+Boost+CS C4.5 tree Boost+CS
Voting Voting
audiology ID3 tree Boost Voting Random Rul e Simple Iteration Best Select.
Weighted Weighted
autos Random Rule Win+Iteration Random Rule Boost
Voting Voting
balance-
Random Rule Boost Voting Random Rule CS+GA Voting
scale
breast- Weighted
Random Rule GA+Iteration Voting Random Rul e Win+Iteration
cancer Voting
Weighted Weighted
breast-w ID3 tree Win ID3 tree CS+Iteration
Voting Voting
colic Random Rule CS+Win Voting ID3 tree Win+Iteration Voting
credit-a C4.5 tree Win+Iteration Voting ID3 tree CS+Boost+Iteration Best Select.
CS means including reinfoecement of classifier set from Classifiser S ystems
Boost means including methods and control structure from Boostin g
Win means including methods and control structure from Window Strate gy
GA means including reinforcement of classifier set with Genetic Al gorithm
Table 8. Accuracies(%) on whole training datasets labeled with three different dis-
tributions(The left table). Number of minimum training sub-samples to outperform
%Def. class(The right table).
Distribution I Distribution I
J4.8 BPNN SVM CLR OneR Bagged J4.8 Boosted J4.8 Stacking CAMLET J4.8 BPNN SVM CLR OneR Bagged J4.8 Boosted J4.8 Stacking CAMLET
anneal 74.7 71.6 47.4 56.8 55.8 87.4 100.0 27.4 77.9 anneal 20 14 17 29 29 16 14 36 20
audiology 47.0 51.7 40.3 45.6 52.3 87.2 47.0 21.5 63.1 audiology 21 18 65 64 41 21 14 56 27
autos 66.7 63.8 46.8 46.1 56.0 89.4 66.7 29.8 53.2 autos 38 28 76 77 70 28 28 77 31
balance- balance-
scale 58.0 59.4 39.5 43.4 53.0 83.3 58.0 39.5 39.5 scale 12 14 15 15 32 14 9 51 128
breast- breast-
cancer 55.7 61.5 40.2 50.8 59.0 88.5 70.5 23.8 41.0 cancer 16 17 22 41
22 14 14 41 36
breast-w 86.1 91.1 38.0 46.8 54.4 96.2 100.0 34.2 77.2 breast-w 7 10 10 18
14 10 6 19 11
colic 91.8 82.0 42.6 60.7 55.7 88.5 100.0 29.5 67.2 colic 8 8 9 22
14 8 8 24 8
credit-a 57.4 48.7 35.7 39.1 54.8 91.3 57.4 26.5 55.7 credit-a 9 12 16 30
28 9 8 51 19
Distribution II Distribution II
J4.8 BPNN SVM CLR OneR Bagged J4.8 Boosted J4.8 Stacking CAMLET J4.8 BPNN SVM CLR OneR Bagged J4.8 Boosted J4.8 Stacking CAMLET
anneal 74.7 70.5 67.4 70.5 73.7 84.2 94.7 67.4 66.3 anneal 54 58 64 76 - 42 38 64 46
audiology 65.8 67.8 63.8 64.4 67.1 83.2 67.1 59.7 65.1 audiology 64 73 45 76 107 50 50 103 84
autos 85.1 73.8 68.1 70.2 73.8 87.9 100.0 66.7 67.4 autos 66 102 84 121 98 45 39 76 76
balance- balance-
scale 70.5 69.8 64.8 65.8 69.8 80.1 85.8 62.6 63.0 scale 118 103 133 162 156 86 92 132 -
breast- breast-
cancer 71.3 77.0 66.4 65.6 77.9 86.9 79.5 73.0 73.0 cancer 50 31 80 92 80 38 36 60 41
breast-w 74.7 86.1 73.4 68.4 74.7 87.3 100.0 63.3 70.9 breast-w 44 36 31 48 71 34 34 52 53
colic 70.5 77.0 65.6 60.7 73.8 85.2 100.0 49.2 60.7 colic 28 24 46 30 42 28 22 48 54
credit-a 70.9 70.0 65.2 65.2 71.3 85.7 87.8 61.7 65.2 credit-a 118 159 - - 173 76 76 120 109
4 Conclusion
In this paper, we have described the evaluation of the nine learning algorithms
for a rule evaluation support method with rule evaluation models to predict
86 H. Abe et al.
References
8. Frank, E, Witten, I. H., Generating accurate rule sets without global optimization,
in Proc. of the Fifteenth International Conference on Machine Learning, (1998)
144–151
9. Freund, Y., and Schapire, R. E.: Experiments with a new boosting algorithm, in
Proc. of Thirteenth International Conference on Machine Learning (1996) 148–156
10. Gago, P., Bento, C.: A Metric for Selection of the Most Promising Rules. Proc. of
Euro. Conf. on the Principles of Data Mining and Knowledge Discovery PKDD-
1998 (1998) 19–27
11. Goodman, L. A., Kruskal, W. H.: Measures of association for cross classifications.
Springer Series in Statistics, 1, Springer-Verlag (1979)
12. Gray, B., Orlowska, M. E.: CCAIIA: Clustering Categorical Attributes into Inter-
esting Association Rules. Proc. of Pacific-Asia Conf. on Knowledge Discovery and
Data Mining PAKDD-1998 (1998) 132–143
13. Hamilton, H. J., Shan, N., Ziarko, W.: Machine Learning of Credible Classifications.
in Proc. of Australian Conf. on Artificial Intelligence AI-1997 (1997) 330–339
14. Hatazawa, H., Negishi, N., Suyama, A., Tsumoto, S., and Yamaguchi, T.: Knowl-
edge Discovery Support from a Meningoencephalitis Database Using an Automatic
Composition Tool for Inductive Applications, in Proc. of KDD Challenge 2000 in
conjunction with PAKDD2000 (2000) 28–33
15. Hettich, S., Blake, C. L., and Merz, C. J.: UCI Repository of machine learning
databases [http://www.ics.uci.edu/˜mlearn/MLRepository.html], Irvine, CA: Uni-
versity of California, Department of Information and Computer Science, (1998).
16. Hilderman, R. J. and Hamilton, H. J.: Knowledge Discovery and Measure of Inter-
est, Kluwe Academic Publishers (2001)
17. Hinton, G. E.: “Learning distributed representations of concepts”, in Proc. of 8th
Annual Conference of the Cognitive Science Society, Amherest, MA. REprinted in
R.G.M.Morris (ed.) (1986)
18. Holte, R. C.: Very simple classification rules perform well on most commonly used
datasets, Machine Learning, Vol. 11 (1993) 63–91
19. Klösgen, W.: Explora: A Multipattern and Multistrategy Discovery Assistant. in
Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy R. (Eds.): Advances
in Knowledge Discovery and Data Mining. AAAI/MIT Press, California (1996)
249–271
20. Michalski, R., Mozetic, I., Hong, J. and Lavrac, N.: The AQ15 Inductive Learn-
ing System: An Over View and Experiments, Reports of Machine Learning and
Inference Laboratory, No.MLI-86-6, George Mason University (1986).
21. Mitchell, T. M.: Generalization as Search, Artificial Intelligence, 18(2) (1982) 203–
226
22. Ohsaki, M., Sato, Y., Kitaguchi, S., Yokoi, H., and Yamaguchi, T.: Comparison
between Objective Interestingness Measures and Real Human Interest in Medical
Data Mining, in Proc. of the 17th International Conference on Industrial and Engi-
neering Applications of Artificial Intelligence and Expert Systems IEA/AIE 2004,
LNAI 3029, (2004) 1072–1081
23. Ohsaki, M., Kitaguchi, S., Kume, S., Yokoi, H., and Yamaguchi, T.: Evaluation
of Rule Interestingness Measures with a Clinical Dataset on Hepatitis, in Proc. of
ECML/PKDD 2004, LNAI3202 (2004) 362–373
24. Piatetsky-Shapiro, G.: Discovery, Analysis and Presentation of Strong Rules. in
Piatetsky-Shapiro, G., Frawley, W. J. (eds.): Knowledge Discovery in Databases.
AAAI/MIT Press (1991) 229–248
88 H. Abe et al.
25. Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Op-
timization, Advances in Kernel Methods - Support Vector Learning, B. Schölkopf,
C. Burges, and A. Smola, eds., MIT Press (1999) 185–208
26. Quinlan, J. R. : Induction of Decision Tree, Machine Learning, 1 (1986) 81–106
27. Quinlan, R.: C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers,
(1993)
28. Rijsbergen, C.: Information Retrieval, Chapter 7, Butterworths, London, (1979)
http://www.dcs.gla.ac.uk/Keith/Chapter.7/Ch.7.html
29. Smyth, P., Goodman, R. M.: Rule Induction using Information Theory. in
Piatetsky-Shapiro, G., Frawley, W. J. (eds.): Knowledge Discovery in Databases.
AAAI/MIT Press (1991) 159–176
30. Tan, P. N., Kumar V., Srivastava, J.: Selecting the Right Interestingness Measure
for Association Patterns. Proc. of Int. Conf. on Knowledge Discovery and Data
Mining KDD-2002 (2002) 32–41
31. Witten, I. H and Frank, E.: DataMining: Practical Machine Learning Tools and
Techniques with Java Implementations, Morgan Kaufmann, (2000)
32. Wolpert, D. : Stacked Generalization, Neural Network 5(2) (1992) 241–260
33. Yao, Y. Y. Zhong, N.: An Analysis of Quantitative Measures Associated with Rules.
Proc. of Pacific-Asia Conf. on Knowledge Discovery and Data Mining PAKDD-1999
(1999) 479–488
34. Zhong, N., Yao, Y. Y., Ohshima, M.: Peculiarity Oriented Multi-Database Mining.
IEEE Trans. on Knowledge and Data Engineering, 15, 4, (2003) 952–960
Training Classifiers for Unbalanced Distribution
and Cost-Sensitive Domains with ROC Analysis
1 Introduction
Classification learning is usually measured by accuracy. However, when skewed
class distribution or unequal classification error cost happens, accuracy could not
ensure the total error cost of the classification algorithm to be minimum. For
example, in the industrial risk management like the medical decision, among
the healthcare data process, the ”healthy” cases are far more than ”patient”
ones; when checking the abuse of credit cards, the ”normal use” cases are far
more than the ”abuse use” ones. Moreover, in some domains the data class’s
distribution can vary remarkably. For example, the proportion of finance fraud
varied significantly from month to month and place to place [1]. In these appli-
cations, the accuracy to predict all the samples to be ”healthy” or ”normal” can
be more than 0.99, since more than 99% data belong to ”healthy” or ”normal”.
Therefore, in classification learning of the unbalanced distribution data classes,
accurate rate 0.99 cannot indicate a classifier is of a good classification ability.
In such a domain, accuracy does not have the ability to evaluate the learning
results or learning performance. Traditional machine learning techniques basi-
cally consider the case that the algorithms learn with the balanced data, where
it is supposed the training samples distribution is uniform and classification er-
ror cost (i.e., misclassification of a positive example is equal to misclassification
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 89–98, 2006.
c Springer-Verlag Berlin Heidelberg 2006
90 X. Zhang, C. Jiang, and M.-j. Luo
of negative example) is equal both in the predicting and training data. In fact,
the practical application data is always unbalanced, especially in medical diag-
noses, pattern recognition, decision-making theory, as well as most kind of risk
management domains.
ROC (Receiver Operating Characteristic) analysis [2] is an evaluation tool.
By using the graphical mode, ROC can be used to measure the classification
ability of a classifier in the conditions of any data distribution and error cost.
ROC curve has less sensitivity of the class distribution and error cost, which
makes ROC be a useful evaluation criteria for learning cost-sensitive or/and
unbalanced class data. ROC analysis is ready to use for two-class classification
domains. However, the methods to classify the multi-class problems need further
studying.
This paper introduces an algorithm EMAUC which is based on ROC analysis,
to do classification of multi-class domains. This algorithm is by means of Error
Correcting Output Codes [3] to transform a multi-class dataset to several two-
class datasets, generate ROC curves from these two-class datasets, and finally
synthesize a multi-class classifier. Compared with other multi-class classifiers,
EMAUC has the advantage of competitive performance, better comprehension
and no sensitivity of skew datasets. EMAUC algorithm employs two-class clas-
sifiers to finish two-class classification. It is based on a machine learning tool
(WEKA [4]) and ROC analysis graph tool (ROCon [5]). Experimental results
with UCI [6] datasets show that EMAUC has a good learning performance in
dealing with multi-class classification domains (including unbalanced distribu-
tion and cost-sensitive domains).
The rest of this paper is arranged as following. Section 2 describes ROC
analysis. Section 3 presents EUMAC algorithm. Section 4 demonstrates the ex-
perimental results. Section 5 is the related work in this area. The final section is
the conclusion.
2 ROC Analysis
ROC analysis origins from the statistical decision-making theory in the 1950s and
has been used to introduce the connection with hit rates and false alarm rates
of classifiers. ROC has been used in the domains such as transistor, psychology,
medical decision-making and so on. Swets [7] extends the ROC analysis to wider
application domains. ROC analysis has the ability to evaluate the classifiers for
learning with unbalanced class distribution and unequal classification error cost.
As mentioned above, accuracy is difficult to evaluate the learning results with
unbalanced distribution and unequal classification error cost, ROC may replace
the accuracy to be a better evaluation criterion.
As we know that the accuracy of a two-class classifier can be described with a
2 × 2 confusion matrix. Assume that the ratio of negatives incorrectly classified
to total negatives is FP rate, and the ratio of positives correctly classified to
total positives is TP rate, ROC can be described by a two-dimension graph [8]
in which TP rate is plotted on the Y axis and FP rate is plotted on the X axis.
Training Classifiers for Unbalanced Distribution and Cost-Sensitive Domains 91
On one hand, the classifiers such as C4.5 and SVM whose output results are
some classes can produce a pair of TP and FP in the dataset; On the other
hand, the classifiers such as Native Bayes and Neural Network whose output
results are some numerical values which indicate the possibility for a training
example to belong to a class. By setting a threshold, if this possibility is higher
than the threshold, the class of this example can be transformed to ”Yes” class,
otherwise ”No”.
There are wide applications of ROC with two-class classifiers. The evaluation
criterion has been developed into Cost Curve [9], AUC(Area Under the ROC)
[20] and so on. AUC has been used as a criterion to evaluate the learning per-
formance, since it simply describes the integrative capability by calculating the
area under the ROC convex hull. It is difficult to describe the learning perfor-
mance of multi-class classifiers. According to the two-class confusion, there is
a N × N confusion matrix including N correct classification (elements in the
positive diagonal) and N 2 − N error classification (elements beyond the positive
diagonal). Within the multi-class ROC analysis, the relationship of TP and FP
is of N 2 − N independent variables, where there exists a N 2 − N dimensional
space. For example, given a three-class learning domain, the points in the ROC
space become a (32 − 3) = 6 dimensional polytope. In fact, how to calculate the
convex hull of super-geometrical object is a NP difficult problem [8]. Therefore,
the solution for multi-class classifier based on ROC analysis cannot be found by
directly extending the technique of two-class ROC analysis.
There is some work to extend two-class classification to multi-class classifi-
cation in the ROC analysis. The classifiers based on three-class ROC analysis
[8,11,14] are difficult to extend to more than five classes because of the computa-
tional feasibility. Therefore, researchers find other methods like OVA (One-Vs-
All), Pairwise to avoid the computational complexity. But they cannot directly
select the best algorithm used in the learning like the two-class ROC analysis.
The work of HTM method [15] aims to solve the multi-class problems, HTM uses
a function to compute the average pairwise AUCs in multi-class classifications
which is nearly like the pairwise method by extending AUC. It acquires the
best multi-class classification measuring performance without considering any
misclassification cost. Therefore there are still some issues in HTM method.
3 EMAUC Algorithm
This section describes a method to solve multi-class ROC problem with an al-
gorithm EMAUC (Multi-AUC with ECOC). By the ECOC (Error Correcting
Output Codes) [3], EMAUC transforms a multi-class dataset to several two-class
datasets. By coding the target classes, ECOC transforms the multi-class train-
ing set to several independent two-class training sets. The transformation can
be explained with the following example.
Table 1 shows a data set with three target class values ”S”,”V”,”I”. Table 2 is
a kind of ECOC form. Each column will be used to generate a two-class training
data. For instance, the first column binary code (1 0 0) is employed to replace
92 X. Zhang, C. Jiang, and M.-j. Luo
program) of WEKA platform is open. Therefore one can rewrite the source code
and improve the capability of some algorithms. EMAUC trains two-class training
sets with C4.5 [16], Logistic, Native Bayes, NBtree and NN (Neural Network).
Finally, it calculates the AUC with a reconstructive ROCon tool.
4 Experimental Results
The experiments have been carried out within 25 UCI [6] datasets. Table 6 de-
scribes these datasets, whose number of target number is more than 3. Our
system transforms a multi-class training set into several two-class training sets.
For each two-class training set, EMAUC invokes C4.5, Logistic, Native Bayes,
NBtree and NN to train two-class classifiers respectively. Then the system com-
putes the AUC value for each two-class training set. Therefore, the EMAUC of
each multi-class training set can be acquired with these AUCs.
Table 7 demonstrates the learning performance evaluated by EMAUC with
the 25 datasets, where the two-class classifiers are ZeroR, C4.5, Logistics, Native
Training Classifiers for Unbalanced Distribution and Cost-Sensitive Domains 95
5 Related Work
ROC analysis has been used as a useful tool for machine learning because it
allows to assess the decision functions which cannot be calibrated, even when
the prior distribution of the classes is unknown [17]. Provost and Fawcett [19]
discuss the evaluation of learning algorithms by use of ROC other than accuracy.
Ling et al. [20] have proved AUC is a statistically consistent and more discrimi-
nating measurement than accuracy. This research group also has experimentally
compared the performance of the NB, C4.5 and SVM with AUC in [21], but
only uses some two-class training sets for evaluation. These works are not con-
sidered about how to divided a multi-class training set to two-class training sets,
which is different from our work. With ECOC, we separate a multi-class data to
two-class data sets.
Flach and Lachiche [22] use ROC curves to improve accuracy and cost of
multi-class probabilistic classifiers. They use a hill-climbing approach to adjust
the weights of each class. A multi-class probabilistic classifier is turned into a
categorical classifiers by setting weights on the class scores for all classes and
assigning the class which maximizes the weighted score. Then they find the best
weight by heightening score to optimal cost or accuracy. This method is also
different from ours in training style. Our algorithm first divides a given multi-
class data set to several two-class training sets with ECOC, then train each
two-class training set, and finally computes the EMAUC from these training
results.
By now, AUC is not only used to evaluate the classification algorithms, also
used to maximize variants of learning methods including SVM, boosting, regres-
sion and so on.
6 Conclusion
This paper introduces a multi-class ROC analysis technique based on ECOC.
This method has been implemented and experimentally evaluated with a set
of multi-class training sets. The results demonstrate that this technique has
competitive performance, better comprehension and less sensitivity with skew
datasets. The proposed algorithm takes the average AUC value of each two-
class classifier to calculate EMAUC. One of the future works is to find some
applications of the proposed algorithm EMAUC in the real world domains.
Acknowledgements
We thank the members of Machine Learning and Artificial Intelligence Labora-
tory, School of Computer Science and Technology, Wuhan University of Science
Training Classifiers for Unbalanced Distribution and Cost-Sensitive Domains 97
and Technology, for their helpful discussion within seminars. This work was
supported in part by the Scientific Research Foundation for the Returned Over-
seas Chinese Scholars, State Education Ministry, and the Project (No.2004D006)
from Hubei Provincial Department of Education, P. R. China.
References
1. T. Fawcett and F. Provost. Adaptive Fraud Detection. Data Mining and Knowledge
Discovery, 291-316, 1997.
2. L. B. Lusted. Logical Analysis in Roentgen Diagnosis. Radiology. 74:178-193. 1960.
3. T.G.Dietterich, G.Bakiri. Solving Multiclass Learning Problems Via Error Cor-
recting Output Codes. Journal of Artificial Intelligence Research, 2:263-286, 1995.
4. WEKA. www.cs.waikato.ac.nz/ml/weka.
5. ROCon. www.cs.bris.ac.uk/Research/MachineLearning/rocon.
6. C.J. Merz, P.M. Murphy, and D.W. Aha. UCI repository of ma-
chine learning databases, University of California, Irvine. Available:
http://www.ics.uci.edu/ mlearn/MLRepository.html, 1998.
7. J. A. Swets, R. M. Dawes, and J. Monahan. Better Decisions through Science.
Scientific American, 2000.
8. T. Fawcett. ROC Graphs: Notes and Practical Considerations for Researchers.
Machine Learning, 2004.
9. C. Drummond and R. C. Holte. What ROC Curves Can’t Do (and Cost Curves
Can). Proceedings of the ROC Analysis in Artificial Intelligence, 1st International
Workshop, 19-26, 2004.
10. C. X. Ling, J. Huang, and H. Zhang. AUC: a Better Measure than Accuracy in
Comparing Learning Algorithms. Canadian Conference on AI, 2003.
11. D.Mossman. Three-way ROCs. Medical Decision Making, 19(1):78-89. Srinivasan,
1999.
12. C. Ferri, P. A. Flach, and J. Hernandez-Orallo. Learning Decision Trees Using
the Area Under the ROC Curve. In Proceedings of the Nineteenth International
Conference on Machine Learning ICML, 139-146, 2002.
13. C. Ferri, J. Hernndez-Orallo,and M.A. Salido. Volume Under the ROC Surface
for Multi-class Problems. Proceedings of 14th European Conference on Machine
Learning, ECML, 2003.
14. C. Ferri, J. Hernndez-Orallo, and M.A. Salido. Volume Under the ROC Surface
for Multi-class Problems. Exact Computation and Evaluation of Approximations.
2003, Univ. Politecnica de Valencia: Valencia. 1-40. DSIC. Univ. Politc. Valncia.
2003.
15. D.J. Hand and R.J. Till. A Simple Generalization of the Area Under the ROC
Curve for Multiple Class Classification Problems. Machine Learning, 45(2): 171-
186, 2001.
16. J.R. Quinlan. C4.5: Programs for Machine Learning. San Mateo, California:
Morgan Kaufmann, 1993.
17. A.P. Bradley. The Use of the Area under the ROC Curve in the Evaluation of
Machine Learning Algorithms. Pattern Recognition, 30:1145-1159.1997.
18. P.A. Flach. The Geometry of ROC Space: Using ROC Isometrics to Understand
Machine Learning Metrics. Proceedings of the International Conference on Machine
Learning, 2003.
98 X. Zhang, C. Jiang, and M.-j. Luo
1 Introduction
Thousands of academic papers related to scientific research appear every year, and the
accumulated literatures over the years are voluminous. It is may be very useful be able
to visualize the entire body of scientific knowledge and track the latest developments in
particular science and technology fields. However, to make knowledge visualizations
clear and easy to interpret are challenging tasks. Many studies have drawn their citation
data by using a key phrase to query citation indexes. Retrieving citation data by a simple
query of citation indexes is a rather crude and limiting technique.
A knowledge domain is represented collectively by research papers and their inter-
relationships in this research area. A knowledge domain’s intellectual structure can be
discerned by studying the citation relationships and analyzing seminal literatures of
that knowledge domain. Knowledge Management (KM) is a fast growing field with
great potential. However, researchers have disagreeing opinions about what
constitutes the content and context of the KM research area [1]. Our Intellectual
Structure of the KM domain has been constructed with predominantly information
systems and management oriented factors [2, 3]. Our study drew primarily on
voluminous science and engineering literature that has given us some interesting
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 99 – 107, 2006.
© Springer-Verlag Berlin Heidelberg 2006
100 T.T. Chen and M.R. Lee
Ponzi [2] studied the intellectual structure and interdisciplinary breadth of KM.
Intellectual structure is established by a principal component analysis applied to an
author co-citation frequency matrix. The author co-citation frequencies used were
derived from the 1994-1998 academic literature and captured by the single search
phrase of “Knowledge Management.” The study found four factors, which were
labeled Knowledge Management, Organizational Learning, Knowledge-based
Theories, and The Role of Tacit Knowledge in Organizations. The interdisciplinary
breadth surrounding Knowledge Management was discovered mainly in the discipline
of management. The study validated the hypothesis with empirical evidence that the
discipline of Computer Science is not a key contributor in KM.
Subramani et al. [3] examined KM research from 1990-2002, and this examination
highlighted the intellectual structure of management related researches in the field.
The results revealed the existence of eight subfields of research on the topic, which
include Knowledge as Firm Capability, Organizational Information Processing and IT
Support for KM, Knowledge Communication, Transfer and Replication, Situated
Learning and Communities of Practice, Practice of Knowledge Management,
Innovation and Change, Philosophy of Knowledge, and Organizational Learning and
Learning Organizations. These sub-fields reflect the influence of a wide array of
foundational disciplines such as management, philosophy, and economics.
The studies reviewed above drew their citation data by using a key phrase querying
citation indexes. The citation data retrieved by a simple query of citation indexes were
rather limited. Our proposed approach is based on a scheme, which constructs a full
citation graph from the data drawn from the online citation database CiteSeer [14].
The proposed procedure leverages the CiteSeer citation index by using key phrases to
query the index and retrieve all matching documents from it. The documents retrieved
102 T.T. Chen and M.R. Lee
by the query are then used as the initial seed set to retrieve papers that are citing or
cited by literatures in the initial seed set [15]. The full citation graph is built by
linking all articles retrieved, which includes more documents than the other schemes
reviewed earlier. The resulted citation graph was built from the literatures and citation
information retrieved by querying the term “Knowledge Management” from CiteSeer
on March, 2006. The complete citation graph contains 599,692 document nodes and
1,701,081 citation arcs. In order to keep the highly cited papers and keep the literature
to a manageable size, we pruned out papers were cited less than 150 times. The
resultant citation graph contains 255 papers and 776 citation arcs.
The co-citation matrix is derived from the citation graph and fed to factor analysis.
Seventeen factors were determined which explained 45.5% total variances. The unit of
analysis used here is based on documents instead of authors. An author is considered as
the proxy of the specialty S/He represents. However, a researcher’s specialty may
change or evolve over time. We therefore took the document as the analysis unit in our
study. The factors and their loading are listed in Table 1.
Factor one represents research on query of semi-structured data or heterogeneous
Information Sources. In contrast with traditional relational and object-oriented
database systems that force all data to adhere to an explicitly specified schema, much
of the information available on-line, such as a WWW site, is semi-structured. Semi-
structured data is relatively varied, irregular, or mutable to easily map to a fixed
schema. Knowledge, in contrast with data and information, is inherently unstructured.
The study of querying semi-structured data could be considered as the harbinger of
the query of the semi-structured or unstructured knowledge base.
Factor two represents research on inductive learning and inductive logic
programming. Inductive learning includes works focused on learning concepts from
examples and learning algorithms. Inductive Logic Programming (ILP) is a machine
learning approach that uses techniques of logic programming. ILP systems develop
predicate descriptions from examples and background knowledge. The derived
predicate descriptions, examples, and background knowledge are all described as logic
programs. These areas of studies try to derive new knowledge from existing facts and
background knowledge.
Factor three is characterized by researches on efficient search and data structure of
multi-dimensional objects. This area of study includes the research of querying image
content by specifying color, texture, and shape of image object, and the application of
R+-Tree for dynamic indexing of multi-dimensional objects. Novel applications of
this line of research in similarity search in massive DNA sequence databases are a
new trend.
Factor four includes earlier knowledge-based related research such as that of
deductive databases and logic programming. Non-monotonic reasoning and logic as
well as semantical issues on non-monotonic logic and predicate logic were also
covered.
Factor five represents machine learning related research, which encompasses
inductive learning, statistical learning, classifiers, and learning algorithms and
programs. The machine learning programs referred to here are based on a
classification model that discovers and analyzes patterns found in records. C4.5 is an
example of an inductive method that finds generalized rules by identifying patterns in
data. This factor appears to be the transitional works that bridge or transcend from the
area of machine learning to data mining.
Factor six represents distributed problem solvers and rational agents, where an
agent is essentially a delegate who solves problems with human like intelligence. The
study of the architecture of a resource-bounded rational agent and the formulation of
the agent’s intention seems to be the precursor of research in the area of intelligent
agent. An intelligent agent is described as a self-contained, autonomous software
module that could perform certain tasks on behalf of its users. It could also interact
with other intelligent agents and/or humans in performing its task(s).
Factor seven represents Knowledge Interchange Format (KIF) and knowledge
sharing. KIF is essentially a language designed for use in the interchange of
knowledge among disparate computer systems. When a computer system needs to
104 T.T. Chen and M.R. Lee
communicate with another computer system, it maps its internal data structures
into KIF. Alternatively, when a computer system reads a knowledge base coded in
KIF, it converts the data into its own internal form. The research of knowledge
sharing tried to find ways of preserving existing knowledge bases and of sharing,
reusing, and building on them. This line of research tries to develop the enabling
technologies to facilitate reusing knowledge bases that have been built and used by AI
systems.
Factor eight represents data mining research that discusses the effective and
efficient algorithms of mining association rules from large databases. Data
mining, in essence, is a knowledge discovering technique that tries to learn rules
and patterns from a large amount of data. The roots of data mining can be traced
back to machine learning as already mentioned.
Factor nine represents constraint query languages and constraint logic programming,
which belong to the subfields of constraint programming that describe computations by
specifying how these computations are constrained. Therefore, the constraint
programming paradigms are inherently declarative, such as Prolog clauses.
Factor ten represents the study of providing unified views of information from
diverse sources or data located within distributed and heterogeneous databases. Papers
under this factor try to utilize the semantic model of a problem domain that could
provide a unified view to integrate information from disparate information sources.
Factors eleven through seventeen explain less than eleven percent of the total variance.
We therefore briefly review these factors altogether. Factor eleven is characterized by
modal and temporal logic. Factor twelve represents a language independent model for
parallel and distributed programming, such as the Linda system [16]. Factor thirteen
represents functional languages and their development environments. Factor fourteen
represents languages for the development and communication of information agents.
Factor fifteen represents STRIPS planning, which is a simple and compact method of
expressing planning problems. Instead of having complex logic statements in the
knowledge base, STRIPS allows only simple positive facts and everything not explicitly
listed as true in the knowledge base is considered false. Factor sixteen is characterized by
World Wide Webs and search engines. Factor seventeen represents Bayesian networks,
which combines knowledge with statistical data for learning.
The Pearson correlation coefficients between items (papers) were calculated when
factors analysis was applied. The correlation coefficients are used as the basis for
PFNET scaling [17]. The value of Pearson correlation coefficient falls between the
range -1 and 1. The coefficient approaches to one when two items correlate
completely. Items that closely relate, i.e., are highly correlated, should be placed
closely together spatially. The distance between nodes is normalized by taking d =
1/(1 + r), whereas r is the correlation coefficient. The distance between items is
inversely proportional to the correlationcoefficient, which maps less correlated
items apart and highly correlated items spatially adjacent.
Revealing Themes and Trends in the Knowledge Domain’s Intellectual Structure 105
Fig. 1. PFNET Scaling with Papers under Same Factors Close to Each Other
4 Research Trends in KM
Similar to the reviewed methodologies, we sought to identify research trends in KM.
Instead of searching all papers in the citation index database, we limited our search to
literature published during the last four years. We do not repeat the description of the
analytical procedure because it is the same as above in section 2.1. Ten factors were
identified by the factor analysis procedure.
Semantic Web, Ontology, and Web Ontology related researches are recent popular
research trends in the KM domain area. Distributed knowledge representation, reason
systems, and description logic are also research interests due to World Wide Web
proliferation. In addition, classifiers and patterns learning, especially in the area of
Webs and hidden databases with Web front end, are active research areas too.
Generally speaking, Extensible Markup Language (XML) and related topics are
increasing. The issue of trust and reliable Web Services composition also represents
one of the ten factors.
5 Conclusion
The intellectual structure of KM had been studied earlier by researchers in the
Information Systems (IS) field. The finding of IS researchers is idiosyncratically
inclined toward IS related research. This bias is probably due to their seminal author
selection procedure, which is further compounded by the citation compilation process.
106 T.T. Chen and M.R. Lee
Our study draws on the CiteSeer citation index, which is primarily located in the
fields of computer, information science, and engineering. The intellectual structure of
knowledge management derived from a predominantly science and engineering
oriented index is quite different from what has been provided by IS researchers. Our
results reveal seventeen sub areas that form the conceptual groundwork of KM.
The current research trends of KM were briefly summarized. Research that
intertwines World Wide Web and XML with classical AI topics seems to be the new
direction. However, we have only seen limited new research that tries to leverage the
rich AI tradition of the past to pursue Web related fields. Trust and security related
issues are getting more attention due to the burgeoning Electronic Commerce and
Electronic Business.
KM encompasses a fairly wide range of studies. It is also a field with great
potential since knowledge has become the most important ingredient in modern
businesses. The KM related research within science and engineering could provide
the theoretical and infrastructural support that is needed by practitioners and
researchers in other fields.
References
1. Earl, M.: Knowledge Management Strategies: Toward a Taxonomy. Journal of
Management Information Systems 18 (2001) 215-242
2. Ponzi, L.J.: The Intellectual Structure and Interdisciplinary Breadth of Knowledge
Management: a Bibliometric Study of Its Early Stage of Development. Scientometrics,
Vol. 55 (2002) 259-272
3. Subramani, M., Nerur, S.P., Mahapatra, R.: Examining the Intellectual Structure of
Knowledge Management, 1990-2002 – An Author Co-citation Analysis. Management
Information Systems Research Center, Carlson School of Management, University of
Minnesota (2003) 23
4. White, H.D., Griffith, B.C.: Author Cocitation: A Literature Measure of Intellectual
Structure. Journal of the American Society for Information Science 32 (1981) 163-171
5. Chen, C.: Visualization of Knowledge Structures. In: Chang, S.K. (ed.): HANDBOOK OF
SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, Vol. 2. World
Scientific Publishing Co., River Edge, NJ, (2002) 700
6. Chen, C.: Searching for Intellectual Turning Points: Progressive Knowledge Domain
Visualization. PNAS 101 (2004) 5303-5310
7. Chen, C., Kuljis, J., Paul, R.J.: Visualizing Latent Domain knowledge. Systems, Man and
Cybernetics, Part C, IEEE Transactions on 31 (2001) 518-529
8. Chen, C., Paul, R.J.: Visualizing a Knowledge Domain's Intellectual Structure. Computer
34 (2001) 65-71
9. Culnan, M.J.: The Intellectual Development of Management Information Systems, 1972-
1982: A Co-Citation Analysis. Management Science 32 (1986) 156-172
10. Culnan, M.J.: Mapping the Intellectual Structure of MIS, 1980-1985: A Co-Citation
Analysis. MIS Quarterly 11 (1987) 340
11. McCain, K.W.: Mapping Authors in Intellectual Space: A Technical Overview. Journal of
the American Society for Information Science 41 (1990) 433-443
12. White, H.D.: Pathfinder Networks and Author Cocitation Analysis: A Remapping of
Paradigmatic Information Scientists. Journal of the American Society for Information
Science & Technology 54 (2003) 423-434
Revealing Themes and Trends in the Knowledge Domain’s Intellectual Structure 107
13. Chen, C., Steven, M.: Visualizing Evolving Networks: Minimum Spanning Trees versus
Pathfinder Networks. IEEE Symposium on Information Visualization. IEEE Computer
Society (2003) 67-74
14. Bollacker, K.D., Lawrence, S., Giles, C.L.: CiteSeer: an Autonomous Web Agent for
Automatic Retrieval and Identification of Interesting Publications. Proceedings of the
second international conference on Autonomous agents. ACM Press, Minneapolis,
Minnesota, United States (1998)
15. Chen, T.T., Xie, L.Q.: Identifying Critical Focuses in Research Domains. Proceedings of
the Information Visualisation, Ninth International Conference on (IV'05). IEEE Computer
Society, London (2005) 135-142
16. Ledru, P.: JSpace: Implementation of a Linda System in Java. SIGPLAN Not. 33 (1998)
48-50
17. White, H.D.: Author Cocitation Analysis and Pearson's r. Journal of the American Society
for Information Science & Technology 54 (2003) 1250-1259
Evaluation of the FastFIX
Prototype 5Cs CARD System
Abstract. The 5Cs architecture offers a hybrid Case And Rule-Driven (CARD)
system that supports the Collaborative generation and refinement of a relational
structure of Cases, ConditionNodes, Classifications, and Conclusions (hence
5Cs). It stretches the Multiple Classification Ripple Down Rules (MCRDR)
algorithm and data structure to encompass collaborative classification,
classification merging, and classification re-use. As well, it offers a very
lightweight collaborative indexing tool that can act as an information broker to
knowledge resources across an organisation's Intranet or across the broader
Internet, and it supports the coexistence of multiple truths in the knowledge
base. This paper reports the results of the software trial of the FastFIX
prototype - an early implementation of the 5Cs model, in a 24x7 high-volume
ICT support centre.
1 Introduction
The 5Cs model is comprised of the Collaborative generation and refinement of a
relational structure of Cases, ConditionNodes, Classifications, and Conclusions
(hence 5Cs). The 5Cs model is a Case And Rule Driven (CARD) system for
Knowledge Acquisition (KA) that uses a Case Oriented Rule Acquisition Language
(CORAL) to acquire rules in a similar way to Multiple Classification Ripple Down
Rules (MCRDR) [7, 8] and its predecessor the Single Classification Ripple Down
Rules (SCRDR) [4], but with significant extensions. For example, new data structures
and algorithms are presented to allow experts to more effectively collaborate in
building up both the knowledge and case bases.
The extensions have been motivated by our work in developing a trouble shooting
system for a high volume ICT support centre. Knowledge in this domain changes
rapidly and is driven by the need to maintain and link problem and solution cases. For
this reason we chose to use the MCRDR combined case and rule based approach to
incremental KA. However, in this domain we found that the knowledge needed to
identify and solve the problem came from many, varied and globally distributed
sources, and that the cases themselves were often in a state of flux and needed to be
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 108 – 119, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Evaluation of the FastFIX Prototype 5Cs CARD System 109
Case 1 Case 2
ConditionNode 1
rule 1: TRUE
(root node)
ConditionNode 2
ConditionNode 3
rule 2
rule 3
ConditionNode 4
rule 4 ConditionNode 5 ConditionNode 6
rule 5 rule 6
ConditionNode 7 ConditionNode 8
rule 7 rule 8
Classification 1
Classification 2
Conclusion 1
Conclusion 2
Unlike many MCRDR systems which only require cornerstone cases to be kept, in a
domain like the call center where the cases are volatile we keep all cases and track all
Case-RuleNode associations. This is somewhat similar to the use of an execution
history, tracking of rule usage and the proposed review of rule usage against the case
history in the HeurEAKA RDR system [1], which uses NRDR and genetic algorithms
for channel routing in VLSI design. In the 5Cs model, a “Tracked” Case is one whose
Live and Registered RuleNodes are being remembered by the system. A Live
RuleNode for a given Case is one that is currently the last TRUE RuleNode on a given
path through the knowledge base for that case. Its conclusions are part of the set of
current conclusions derived from the knowledge base for the case. The system
Evaluation of the FastFIX Prototype 5Cs CARD System 111
remembers its Live RuleNodes for “Tracked” cases. Live RuleNodes may be correct
or incorrect. Correcting incorrect live RuleNodes is the primary role of a (human or
computer) expert who trains the knowledge base.
A Registered RuleNode is one that has been confirmed by a User as being correct
and TRUE for that Case. For each case, each RuleNode registration may be current or
expired. The test for RuleNode-Case registration expiry is that if the last modification
or creation date on the RuleNode-Case association is more recent than or the same as
the last modified dates on the RuleNode and the Case, then the registration is current.
Otherwise, the registration is expired. The expiry of registered case-RuleNode
relationships is something that can be notified and displayed to users in the summary
of cases or RuleNodes of interest to them. The user can also be notified whenever the
list of live RuleNodes differs from the list of registered RuleNodes for a given case,
or the list of live cases differs from the list of registered cases for a given RuleNode.
In the Pathology Expert Interpretative Reporting System (PEIRS), typographical or
conceptual errors in RuleNode expressions were corrected with the use of “fall-
through” rules [5, p. 119], but potentially resulting in corruption of the KBS [5, p.88].
Similarly, Kang identifies the situation where the domain knowledge represented by
an existing rule tree needs to be changed in such a fashion that a cornerstone case for
an existing RuleNode will drop-down to a new child RuleNode [7, p.50]. He suggests
that if absolutely necessary, the rules suggested by the MCRDR difference list for the
new RuleNode can be overridden [7, p. 65]. The approach was not fully explored [7,
p. 67]. Note that cases don’t only drop-down. They may drop-across from a sibling
node; or they can recoil to an ancestor RuleNode for example when the editing or
relocation of a dependent RuleNode is restrictive enough that the case under review is
now excluded.
Unlike in pathology where each report handled a new case and changes to a
patients data resulted in a new case with new conclusion, in the Call Centre domain it
is likely that information about a customer problem will develop over days or even
months in situations where a problem is intermittent or temporarily fixed but
reemerges at a later date. A supposed solution to a problem may turn out to not have
really fixed the problem at all. This means that the knowledge (ie the rules) and the
problem/solution case/s may also need changing. The 5Cs structure and algorithms
allow the knowledge base to evolve, including changes to cases, RuleNodes,
intermediate conclusions, ontological entry or attributes. This is achieved through
tracking of live vs registered case-RuleNode associations.
In fact, FastFIX tracks all changes to the system and identifies users when an
inconsistency in the system occurs. The strategy assists with more rapid knowledge
acquisition since it can highlight inconsistencies between expert opinions, just as
MCRDR supported quicker KA by allowing more than one rule to be acquired for
each case. It lets users capitalize on the knowledge acquisition opportunity presented
by the case drop-down scenario, and this in turn may result in quicker coverage of the
domain and greater learning opportunities for users. The separation of live and
registered case-RuleNode associations is a key part of being able to resolve
classification conflicts between multiple experts, and even between what a single
expert thinks today, as compared with tomorrow [6].
112 M. Vazey and D. Richards
As mentioned earlier, a subset of the features proposed by the 5Cs model was tested
in the ICT support center problem domain via a prototype system known as FastFIX.
Significant novel ideas implemented in the FastFIX prototype and tested during the
FastFIX software trial included:
• The ability for multiple users to build an MCRDR-based decision tree in a wiki1-
style collaborative effort. This includes the identification of classes of incoming
problem cases and manual indexing of solutions by multiple users using rule
conditions equivalent to logical tags in a folksomony2.
• Reference to multiple exemplar cornerstone cases for each RuleNode.
• The ability for users to edit previously created cases (including cornerstone cases)
and RuleNodes in the system.
• Continuous background monitoring of changes to the knowledge base so that users
with affected RuleNodes and Cases can notice and respond to the changes. This
approach allows classification conflicts to be identified, clarified and resolved and
hence it enhances knowledge acquisition.
• The ability for users to “work-up” a case using a novel Interactive and Recursive
MCRDR decision structure.
• Separation between classifications and conclusions so that richer classification
relationships can be maintained.
• The ability for users to relocate i.e. move RuleNodes in the system.
1
Wikipedia (http://www.wikipedia.org/) defines a Wiki as the collaborative software and resultant
web forum that allows users to add content to a website and in addition, to collaboratively edit it.
Wikipedia demonstrates the power of the Wiki paradigm.
2
The term ”folksomony”2 was first coined by Thomas Vander Wal2 (2005) to describe forums
in which people can tag anything that is Internet addressable using their own vocabulary so
that it is easy for them to re-find the item.
Evaluation of the FastFIX Prototype 5Cs CARD System 113
User ID 1 2 3 4 5 6 7 8 9 10 11 12 Totals
Total Case Creations 64 0 15 2 2 83 1 1 0 0 1 3 172
Total RuleNode Creations 42 0 13 0 0 30 1 0 0 0 0 21 107
Total Case Edits 45 0 22 0 0 33 0 0 2 0 7 30 139
Total RuleNode Edits 32 0 13 0 0 59 0 0 0 0 0 37 141
Total Case drop-throughs
resulting from RuleNode Creations 59 0 1 0 0 43 0 0 0 0 0 1 104
Total Case drop-through Events
resulting from RuleNodeEdits 24 0 6 0 0 48 0 0 0 0 0 18 96
User KA Activity
90
80
70
60
KA Activity
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12
User ID
After 7 hours of effort the test team had captured 105 cases and 55 RuleNodes. The
red arrow in each of following figures has been used to indicate this point in time. At
this point the team had provided enough RuleNodes to automatically solve
approximately 90% of errors on errant equipment in the selected error sub-domain.
These errors contribute to 30% of all errors seen by the system which account for
20% of the ~5,000 problem cases per day seen by the global ICT support centre.
Hence after 7 hours of effort enough knowledge had been acquired to automatically
provide solutions to more than 270 cases per day, without requiring the trouble-
shooters to figure out the class of problem on hand, where to search for a solution, or
what to search for, for example in the corporate solution tracking system.
114 M. Vazey and D. Richards
Say that each case takes on average 15 minutes to solve, and that 1 minute of
this time is spent in determining the problem and finding its solution. This
represents a time saving of 1 mins * 270 cases per day, or 4.5 hours per day.
Actually, the average solution search time is possibly a lot longer. One of the
problems with manually searching for solutions is that if you haven’t found the
answer, you don’t know if its just because your not searching for it correctly, or if
its because a solution does not exist. The FastFIX system has the advantage that
it unambiguously associates relevant solutions with their incoming problem
classes. If the answer is unknown, FastFIX can provide that information.
After the first 105 cases and 55 RuleNodes, the test team broadened the
knowledge domain being covered to include a new error sub-domain. This
evaluation strategy parallels the strategy used in the 4 year PEIRS SCRDR
software trial in which additional domains were incorporated incrementally after
the pathologists had gained confidence in the performance of the system with the
initially selected thyroid domain [5, p.90]. The trial of the prototype ceased after
107 RuleNodes and 172 cases had been accumulated. At that point, no new
information was being gathered and attention was turned to additional features to
enhance the system.
Fig 3 shows the cumulative case and RuleNode creations and Case drop-throughs
resulting from RuleNode Creations in greater detail. A unique KA Event ID has
been assigned to each unique timestamp captured in this subset of data and it has
been used to construct the x-axis. It can be observed in Fig 3 that the first 20
RuleNodes were provided to the system in a top-down (and hence rule-driven
linear) manner. In contrast, RuleNodes 21 to 55 were provided mostly on the
basis of cases seen in a case-driven bottom-up monotonically increasing and
stochastic manner as described in [10]. After this point, users were selective in
choosing which cases to train the system with, choosing cases that were expected
to be novel. Hence RuleNodes 56 to 107 were provided to the system in a more
top-down (and hence rule-driven linear) manner as for the first 20 RuleNodes.
It is difficult to say how the ability of users to edit RuleNodes affects the
overall case and RuleNode creation trajectories. If most of the RuleNode edits
were cosmetic e.g. as a result fixing spelling mistakes then it can be expected that
these KA trajectories would be little affected by the RuleNode edits. However, if
RuleNode editing represents a significant KA activity whereby genuinely new
knowledge is being acquired, rather than existing knowledge being cosmetically
corrected, then those RuleNode edit events should be added into the above case-
driven KA trajectory. However, it was beyond the scope of this trial to examine
this in any detail.
Fig 4 shows the Cumulative Case Creation and RuleNode Edit Curves. The
number of RuleNode edits appears to grow in proportion to the number of cases
seen by the system, which indicates that RuleNode editing tends to be a top-down
knowledge acquisition activity.
Evaluation of the FastFIX Prototype 5Cs CARD System 115
172
180
160
Cumulative KA Events
140
107
105
120
100
80
55
60
40
20
20
0
0 50 100 150 200 250 300
KA Event ID
Cum Case Creations Cum RuleNode Creations
Cum Case Drop-throughs
180
160
Cumulative KA Events
140
120
100
80
60
40
20
0
0 50 100 150 200 250 300 350
KA Event ID
Individual KA Curves are displayed in the next four figures for the 3 most active users
(with User IDs 1, 6, and 12) in the system, including the author (User ID 12).
Vertical lines have been included in the graphs to show the co-occurrence of RuleNode
Creations and their resultant case drop-downs, as well as RuleNode Edits and their
resultant case drop-down events.
Case edit events have been left out of the curves to allow the rate of RuleNode
accumulation to be compared with the rate of Case accumulation.
In Fig. 5, User 1’s KA curves show a steady rate of accumulation of both cases and
RuleNodes. It appears that RuleNodes are acquired bottom-up prior to the domain
change, and top-down thereafter.
In Fig 6, User 6’s KA curves show a steady rate of accumulation of both cases and
RuleNodes. As for User 1 it appears that RuleNodes are acquired bottom-up prior to the
domain change, and top-down thereafter. We can also see that User 6 undertook a major
RuleNode editing activity between KA event 80 and 100. This effort was focussed on
widening the scope of the rule statements in a number of RuleNodes.
In Fig 7, User 12’s (i.e. the author and researcher’s) KA curves show a focus on
RuleNode edits in the early phases. At this point the system was still under development
so both the users and the system were changing in the way they interacted with each-other.
User 12 created the first 20 RuleNodes in the system in a top-down fashion after
consulting with User 6. In contrast, User 12 was involved in very few case creations.
Evaluation of the FastFIX Prototype 5Cs CARD System 117
Note that the initial knowledge base activity by user 12 parallels that reported
in the early days of the PEIRS trial. In PEIRS the first 198 rules were added off-
line while interfacing problems were sorted out [5, 7]. In FastFIX, the first 21
RuleNodes were added in this manner.
4 Conclusions
Acknowledgments. This work has been funded via an Australian Research Council
Linkage Grant LP0347995 and a Macquarie University RAACE scholarship. Thanks
also to my industry sponsor for ongoing financial support.
References
1. Bekmann, J. and Hoffmann, A. (2004) HeurEAKA– A new approach for Adapting GAs to
the problem domain, Eds. Zhang, Guesgen, Yeap PRICAI 2004: Trends in Artificial
Intelligence, Springer, Berlin, 2004, pp. 361 - 372 .
2. Beydoun, G. and Hoffmann, A. (2000). Incremental acquisition of search knowledge.
Journal of Human-Computer Studies, 52:493.530, 2000.
3. Beydoun, G., Hoffmann, Fernandez Breis, J. T, Martinez Bejar, R. Valencia-Garcia, R.
and Aurum, A. (2005) Cooperative Modeling Evaluated, International Journal of
Cooperative Information Systems, World Scientific, 2005, 14 (1), 45-71.
4. Compton, P. J. and R. Jansen (1989). A philosophical basis for knowledge acquisition. 3rd
European Knowledge Acquisition for Knowledge-Based Systems Workshop, Paris: 75-89.
5. Edwards, G. (1996) Reflective Expert Systems in Clinical Pathology MD Thesis, School
of Pathology, University of New South Wales.
6. Gaines B. R., Shaw M. L. G., (1989) Comparing Conceptual Structures: Consensus,
Conflict, Correspondence and Contrast.
7. Kang, B. (1996) Validating Knowledge Acquisition: Multiple Classification Ripple Down
Rules PhD Thesis, School of Computer Science and Engineering, University of NSW,
Australia.
8. Kang, B., P. Compton and P. Preston (1995). Multiple Classification Ripple Down Rules :
Evaluation and Possibilities. Proceedings of the 9th AAAI-Sponsored Banff Knowledge
Acquisition for Knowledge-Based Systems Workshop, Banff, Canada, University of
Calgary.
9. Richards, D and Menzies, T. (1998) Extending the SISYPHUS III Experiment from a
Knowledge Engineering to a Requirements Engineering Task, Richards D., Menzies, T.,
Task 11th Workshop on Knowledge Acquisition, Modeling and Management, SRDG
Publications, Banff, Canada, 18th-23rd April, 1998.
10. Vazey, M. (2006) Stochastic Foundations for the Case-driven Acquisition of Classification
Rules. EKAW 2006 (in publication).
11. Vazey, M. and Richards, D. (2004) Achieving Rapid Knowledge Acquisition. Proceedings
of the Pacific Knowledge Acquisition Workshop (PKAW 2004), in conjunction with The
Eighth Pacific Rim International Conference on Artificial Intelligence, August 9-13, 2004,
Auckland, New Zealand, 74-86.
12. Vazey, M. and Richards, D. (2006) A Case-Classification-Conclusion 3Cs Approach to
Knowledge Acquisition - Applying a Classification Logic Wiki to the Problem Solving
Process. International Journal of Knowledge Management (IJKM), Vol. 2, Issue 1, pp 72-
88; article #ITJ3096, Jan-Mar 2006.
13. Wille, R. (1992) Concept Lattices and Conceptual Knowledge Systems Computers Math.
Applic. (23) 6-9: 493-515.
Intelligent Decision Support for Medication Review
1 Introduction
Sub-optimal drug usage is a serious concern both in Australia and overseas [1, 2],
resulting in at least 80,000 hospital admissions annually - approximately 12% of
all medical admissions and reflecting a cost of about $400 million annually, with
the majority of these affecting elderly patients [3]. MRs are seen as an effective
way to improve drug usage. However, the quality of MRs produced is inconsistent
across reviewers. Further to this, many community-based pharmacists are still
unwilling to undertake this new role, citing reasons including fear of error and a
lack of confidence [4].
This paper proposes a different approach to improving the quality of the MRs,
and possibly even improving the uptake of the role within the pharmaceutical com-
munity. It is suggested that the answer may lie in the development of medication
management software which includes Intelligent Decision Support features. To
date, the majority of incarnations of medication management software for produc-
ing MRs has lacked any form of genuine Decision Support features [5]. Unfortu-
nately, Knowledge Based System (KBS) techniques which may be suitable to this
problem have been designed to handle steadfast, well defined sets of knowledge,
and have historically not been well suited to poorly structured or dynamic sets of
knowledge such as the set found in the domain of MR. However, newer techniques
such as Case Based Reasoning (CBR) and Ripple Down Rules (RDR) may offer
new possibilities in handling knowledge of this kind, since they are easily, even
naturally, maintainable and alterable [6, 7].
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 120 – 131, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Intelligent Decision Support for Medication Review 121
2 Medication Reviews
MR is a burgeoning area in Australia and other countries, with MRs seen to be an
effective way of improving drug usage and reducing drug related hospital admissions,
particularly in the elderly and other high risk patients [1, 3]. This has prompted the
Australian government to initiate the Home Medicines Review scheme (HMR) and
the Residential Medication Management Reviews (RMMRs) scheme. These schemes
provide remuneration to pharmacists performing MRs via a nationally funded pro-
gram [3]. However, it is known that despite Residential Medication Management
Reviews (RMMRs) being introduced in 1997 they still do not have a conceptual
model for delivery, which has resulted in a wide range of differing qualities of service
being provided [4].
To perform a MR, Pharmacists assess potential Drug Related Problems (DRPs) and
Adverse Drug Events 1 (ADEs) in a patient by examining various patient records,
primarily their medical history, any available pathology results, and their drug regime
(past and current) [8]. The expert looks for a variety of indicators between the case
details provided checking for known problems, such as an: Untreated Indication –
where a patient has a medical condition which requires treatment but doesn’t have the
treatment; Contributing Drugs – where a patient has a condition and is on a drug which
can cause or exacerbate said condition; High Dosage – where a patient is potentially on
a too high dosage because of a combination of drugs with similar ingredients; Inappro-
priate Drug – where a patient is on a drug that is designed to treat a condition they
don’t seem to have or is contraindicated in their condition; and many others besides.
Once these indicators have been identified a statement is produced explaining each
problem, or potential problem, and often what the appropriate course of action is.
3 Methodology
In order to produce a medication management system with intelligent decision
support features it was necessary to produce two major software elements. The
first was a standard implementation of a database “front-end” from which it is
possible for a user to enter all the details of a given patient’s case, or at least
those parts which are relevant to the chosen domain. The second was an implemen-
tation of a Multiple Classification Ripple Down Rules engine which can sufficiently en-
capsulate the types of conditions and knowledge required for the domain and facilitate the
design of an interface from which the engine can be operated, particularly during the
Knowledge Acquisition phase.
3.1 Database
The design of the database to store the MR cases was relatively trivial, and will not be
given much detail here. The preliminary design idea was taken from existing
1
Defined by the World Health Organisation as being “an injury resulting from medical inter-
vention related to a drug.” 2. Bates, D., et al., Incidence of adverse drug events and potential
adverse drug events. Implications for prevention. ADE Prevention Study Group. JAMA,
1995(274): p. 29-34.
122 I. Bindoff et al.
Ripple Down Rules (RDR) is an approach to building KBSs that allows the user to
incrementally build the knowledge base while the system is in use, with no outside
assistance or training from a knowledge engineer [7]. It generally follows a forward-
chaining rule-based approach to building a KBS. However, it differs from standard rule
based systems since new rules are added in the context in which they are suggested.
Observations from attempts at expert system maintenance lead to the realisation that
the expert often provides justification for why their conclusion is correct, rather than
providing the reasoning process they undertook to reach this conclusion. That is, they
say ‘why’ a conclusion is right, rather than ‘how’. An example of this would be the
expert stating “I know case A has conclusion X because they exhibit features 1, 4 and
7”. Furthermore, experts are seen to be particularly good at providing comparison be-
tween two cases and distinguishing the features which are relevant to their different
classifications [10]. With these observations in mind an attempt was made at producing
a system which mimicked this approach to reasoning, with RDR being the end result.
3.3 Structure
The resultant RDR structure is that of a binary tree or a decision list [11], with excep-
tions for rules which are further decision lists. The decision list model is more intui-
tive since, in practice, the tree would have a fairly shallow depth of correction [12].
The inferencing process works by evaluating each rule in the first list in turn until a
rule is satisfied, then evaluating each rule of the decision list returned by that satisfied
rule similarly until no further rules are satisfied. The classification that was bound to
the last rule that was satisfied is given.
RDR can be viewed as an enhancement to CBR [6, 13, 14], with RDR providing a
utility, in the form of an algorithm, a structure and rules, with which to demonstrate
which parts of the case are significant to a particular classification [15].
The RDR method described above is limited by its inability to produce multiple con-
clusions for a case. To allow for this capability - as this domain must - MCRDR
should be considered [16] to avoid the exponential growth of the knowledge base that
would result were compound classifications to be used.
MCRDR is extremely similar to RDR, preserving the advantages and essential
strategy of RDR, but augmented with the power to return multiple classifications.
Contrasting with RDR, MCRDR evaluates all rules in the first level of the knowledge
base then evaluates the next level for all rules that were satisfied and so on, maintain-
ing a list of classifications that should fire, until there are no more children to evaluate
or none of the rules can be satisfied by the current case [12]. An example of this can
be seen in Fig. 1.
Intelligent Decision Support for Medication Review 123
RULE 0 (ROOT)
Fig. 1. The highlighted boxes represent rules that are satisfied for the case (cold, rain, windy),
the dashed box is a potential stopping rule the expert may wish to add [17]
Table 1. Example of a decision list from [7, 15, 17, 19]. The list can contain negated
conditions.
To determine where the new rule must go it must first be determined what type of
wrong classification is being made. The three possibilities are listed in Table 2.
Table 2. The three ways in which new rules correct knowledge base [12]
The system was handed over to the expert with absolutely no knowledge or conclu-
sions pre-loaded. The expert was wholly responsible for populating the knowledge
base. Over the course of approximately 15 hours they were able to add the rules re-
quired to correctly classify 84 genuine MR cases that had been pre-loaded into the
system.
It is observed in Fig. 2 that the number of rules in the system progressed linearly as
more cases were analysed, at a reasonably consistent rate of about 2.3 rules per case.
This suggests that the system was still in a heavy learning phase when the experiment
was finished, since it has previously been observed that RDR systems will show a
flattening pattern in the rate of growth of the knowledge base at approximately 80%
of domain coverage [12]. This has complications for many of the remaining tests, in
that their results must be understood to reflect the knowledge base while it is still
learning heavily. The general conclusion that can be applied here is that most results
will be expected to improve with additional testing, and that further testing is indeed
required. This is because without demonstrating that the rate of learning has begun to
slow down it is impossible to adequately prove that the heavy learning phase, which
requires a significantly higher level of expert maintenance, will cease.
Intelligent Decision Support for Medication Review 125
250
200
Rules in System
150
100
50
R2 = 0.9864
0
0 20 40 60 80 100
Cases Analysed
Fig. 2. The number of rules in the system grows linearly as more cases are analysed
It was estimated by the expert at the time of cessation of the experiment that the
system had encapsulated around 60% of the domain [20], this estimation is sup-
ported by the evidence shown in Figure 3. It can be seen that the average number
of correct classifications the system provided rose quite steadily into the 60th
percentile, although the percentage correct from case to case did vary quite a lot,
as is to be expected when the system is still in the heavy learning phase.
The expert predicted potential classification rates in the order of 90% [20], so
considering 84 cases had been analysed it could be estimated that in order to
80%
70%
60%
50%
Provided
40%
30%
R2 = 0.6883
20%
10%
0%
0 20 40 60 80 100
Case s Analysed
Fig. 3. The percentage of conclusions provided by the system for each case that were already
correct
126 I. Bindoff et al.
reach this rate at least another 40 cases should be analysed, and it would be unex-
pected if the number of additional cases needed to be analysed exceeded about
120, based on previous figures found for systems of this kind [12, 18]. These
figures are justified by following the trend-line in Figure 3 which shows the clus-
tered average of correct conclusions provided by the system for each group of 5
cases analysed, although it is conceded that this trend-line is only a rough ap-
proximation. If it is followed linearly as demonstrated thus far it reaches 90% at
approximately 120 cases, if it is assumed that this trend-line may begin to plateau
though, as expected, it is possible that the number of extra cases required may
grow considerably, to reflect the slower rate of learning.
The results shown in Figure 4 are very convincing, with the system sometimes
finding half again as many classifications per case as the expert and quite consis-
tently remaining at least one classification ahead. It should be re-iterated that the
system found all these classifications using only a smaller set of the same knowl-
edge the expert had. This suggests the expert consistently misses classifications
they should find. In other words, they just don’t notice them on the particular
case. The system does not suffer from this, it will notice anything that it is trained
to know about without exception.
12
Classifications Found
10
8
Found by System
6
Found by Expert
4
0
1
7
13
19
25
31
37
43
49
55
61
67
73
79
Case Number
Fig. 4. The system found significantly more correct classifications than the expert
It was found that the expert often appended classifications to previous cases after the
systems prompting, particularly early in the systems training. Evidence of this is
shown in Fig. 5. The percentage reduced dramatically even after only a small number
of cases, suggesting the system was rapidly helping to reduce the experts rate of
missed classifications, by suggesting the classifications for them, rather than making
the expert notice themselves. The trend-line in Fig. 5 is only an approximation, since
relatively few cases have been analysed thus far, and noise is still significant.
Intelligent Decision Support for Medication Review 127
35%
30%
Percentage of errors
25%
20%
15%
R2 = 0.6495
10%
5%
0%
0 20 40 60 80 100
Cases Analysed
It was found that the rate of error in each case was quite high, averaging 13.4% and
with some going over 50%. Clearly the expert is making errors regularly, as was
expected, and yet these numbers would be expected to be even higher were more
complete training done. It is important to note that the results shown in Fig. 6 are
representative of all the errors (missed classifications) that the system has fixed
through the normal course of operation, and not the actual number of errors per
case. This figure suggests that over 1 in 9 classifications are missed, although it is
unclear what type of classifications these are. What level of threat are these classifi-
cations likely to pose? One would like to assume that the expert would not miss life
threatening classifications, because they would have a particular focus on these, but
additional experimentation is clearly required to determine what kinds of classifica-
tions the expert is missing and what the consequences of this is.
It was considered possible that the domain of MR might damage the maintainability
and usability of the system due to both its inconsistent/dynamic nature, and the large
number of variables within each case. It was considered possible that the dynamic
nature of the domain might result in a need for an excessive number of exceptions to be
added to the knowledge base, and that the large number of variables within each case
may have resulted in an excessive number of conditions required for each rule. Each
of these afflictions would increase the time taken to maintain/use the system and pos-
sibly make it untenable. As such, tests were carried out to determine whether these
figures deviated remarkably from the normal range to be expected in a system of this
nature.
128 I. Bindoff et al.
60
50
Overall Error %
40
30
20
10
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81
Case
Cornerstones Seen
The results here are promising from a usability point of view, with the expert
rarely having to consider cornerstone cases in the creation of rules, with the ma-
jority of rules having no cornerstone cases to consider. In fact the expert saw an
average of only 0.42 cornerstones per rule. What this means is that the expert
should be able to add rules relatively quickly, with the time required to validate
their rules being small.
It can be determined from Table 3 that the structure of the knowledge base tree was
extremely shallow and branchy, meaning the possibility of an excessive number of
exceptions has not, at least at this point, come to light at all.
Intelligent Decision Support for Medication Review 129
The nature of the rules in the knowledge base is also of interest, with further sup-
port for the maintainability of the system shown in the fact that the average number of
conditions selected in a rule was only 1.7, with longer rules of 4 or 5 conditions being
virtually non-existent and no rules with 6 or more being present.
To get a more complete view of the knowledge base it is necessary to analyse what
outputs the rules map to. With the knowledge base that was built in the process of this
experiment 85 individual conclusions were defined. When it is considered that every
rule except stopping rules, of which there are 154, is linked to a conclusion it can be
seen that there is 1 conclusion for every 1.8 rules, as can be demonstrated with the
data used in Fig. 7. It is evident from this figure that, although most conclusions are
only used by one rule, some conclusions are used very often. In other words they have
many different sets of conditions which can lead to them.
16
14
12
10
Uses
8
6
4
2
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85
Conclusion
5 Conclusions
Initial experimentation suggests that the proposed method using MCRDR can suc-
cessfully represent knowledge where the knowledge sources (human experts) are
inconsistent. The system is shown to have reached about an 60% classification rate
with less than 15 expert hours and only 84 cases classified – a good outcome in the
circumstances. The knowledge base structure does not show any major deviations
130 I. Bindoff et al.
from what would be anticipated in a normal MCRDR system at this stage. The main-
tainability of the system does not appear to have been adversely affected thus far, with
the expert being faced with only few cornerstone cases during the knowledge base
validation, and the time taken to add rules being negligible.
From a MR perspective the system is seen to be capable of: providing classifications
for a wide range of Drug Related Problems; learning a large portion of the domain of
MRs quickly; producing classifications in a timely manner; and importantly, vastly
reducing the amount of missed classifications that would otherwise be expected of the
reviewer. It is expected that this system, or a future incarnation of this system, would be
capable of achieving classification rates around 90% [20]. If this figure is to be realised
it is possible that this system would be capable of achieving three major goals:
• Reducing the amount of missed classifications
o Thus improving the consistency (quality) of service
• Improving the confidence of potential medication reviewers
It has already been noted that the number of errors this system detected and re-
paired was significant, and the number of errors was seen to reduce as the expert
populated the knowledge base and this result alone would be enough to warrant fur-
ther work. It has also been observed that the amount of time taken to perform a MR
using this system should not be adversely affected. As for the final point it is antici-
pated that a system such as this might improve reviewer’s confidence by providing a
reliable second level of checking for their conclusions, since this system is designed
and trained to act as an expert in the field did.
6 Further Work
It should be noted that the system built for this study was intended only for an initial
proof of concept testing. Further testing is needed over a broader range of cases to verify
the results shown in this paper, however initial testing does not suggest any insurmount-
able problems will arise. On top of this, the system could be more powerful and better
encompass the domain by including the additional features mentioned below.
An important feature that was missing from the prototype was the handling of time
series data, such that the expert would be able to define rules such as “increasing” or
“decreasing” for things like Weight, Blood Pressure, or a Pathology result. Further
still, they might define things like “recent” or “old”, which check whether a result is
older or younger than defined thresholds, newest, oldest, average and others. As the
system stands it will fire on a rule that states “Creatinine > 0.12” even if the result
which says their Creatinine level was 0.13 was taken 15 years prior. This is undesir-
able, with the meaning of the results varying across periods of time such that the ex-
pert may wish to define rules based on different types of results.
6.2 Standardisation
It was observed that the knowledge acquisition workload is increased when inconsis-
tent nomenclature is allowed, such as it so often is in many medical systems. To
Intelligent Decision Support for Medication Review 131
prevent this increased workload for the expert, it would be prudent to derive and en-
force a strict scheme for the data input. A possible complication is that users may find
it difficult to locate options which are not named as expected. To handle this it would
be possible to implement another interpretive layer of hierarchy, essentially allowing
the user to use their own preferred nomenclature, and then defining within the system
that their chosen nomenclature is synonymous to whichever standardised equivalent is
selected by the system designers.
References
1. Peterson, G., Continuing evidence of inappropriate medication usage in the elderly, in
Australian Pharmacist. 2004. p. 2.
2. Bates, D., et al., Incidence of adverse drug events and potential adverse drug events. Im-
plications for prevention. ADE Prevention Study Group. JAMA, 1995(274): p. 29-34.
3. Peterson, G., The future is now: the importance of medication review, in Australian Phar-
macist. 2002. p. 268-75.
4. Rigby, D., The challenge of change - establishing an HMR service in the pharmacy, in
Australian Pharmacist. 2004. p. 214-217.
5. Kinrade, W., Review of Domiciliary Medication Management Review Software. 2003,
Pharmacy Guild of Australia. p. 77.
6. Aamodt, A. and E. Plaza, Case-Based Reasoning: Foundational Issues, Methodological
Variations, and System Approaches, in AICom - Artificial Intelligence Communications.
1994. p. 39-59.
7. Compton, P., et al. Knowledge Acquisition without Analysis. in Knowledge Acquisition for
Knowledge-Based Systems. 1993. Springer Verlag.
8. Tenni, P., et al. to I. Bindoff. 2005.
9. Bonner, C., MediFlags. 2005.
10. Compton, P. and R. Jansen. A philosophical basis for knowledge acquisition. in European
Knowledge Acquisition for Knowledge-Based Systems. 1989. Paris.
11. Rivest, R., Learning Decision Lists, in Machine Learning. 1987. p. 229-246.
12. Kang, B., P. Compton, and P. Preston, Multiple Classification Ripple Down Rules. 1994.
13. Kolodner, J., R. Simpson, and K. Sycara-Cyranski. A Process Model of Cased-based Rea-
soning in Problem Solving. in International Joint Conference on Artificial Intelligence.
1985. Los Angeles: Morgan Kaufmann.
14. Kolodner, J.L., Special Issue on Case-Based Reasoning - Introduction. Machine Learning,
1993. 10(3): p. 195-199.
15. Kang, B. and P. Compton, A Maintenance Approach to Case Based Reasoning. 1994.
16. Kang, B., P. Compton, and P. Preston. Multiple Classification Ripple Down Rules:
Evaluation and Possibilities. in AIII-Sponsored Banff Knowledge Acquisition for Knowl-
edge-Based Systems. 1995. Banff.
17. Bindoff, I., An Intelligent Decision Support System for Medication Review, in Computing.
2005, University of Tasmania: Hobart. p. 65.
18. Preston, P., G. Edwards, and P. Compton. A 2000 Rule Expert System Without a Knowl-
edge Engineer. in AIII-Sponsored Banff Knowledge Acquisition for Knowledge-Based Sys-
tems. 1994. Banff.
19. Compton, P. and R. Jansen. Cognitive aspects of knowledge acquisition. in AAAI Spring
Consortium. 1992. Stanford.
20. Tenni, P. to I. Bindoff. 2005.
A Hybrid Browsing Mechanism Using
Conceptual Scales
1
Department of Computer Science Education,
Catholic University of Daegu, 712-702, South Korea
[email protected]
2
School of Computer Science and Engineering,
University of New South Wales, Sydney 2052, Australia
[email protected]
1 Introduction
Formal Concept Analysis (FCA) was developed by Rudolf Wille in 1982 [17]. FCA is
a theory of data analysis which identifies conceptual structures among data based on
the philosophical understanding of a ‘concept’ as a unit of thought comprising its
extension and intension as a way of modeling a domain. The extension of a concept is
formed by all objects to which the concept applies and the intension consists of all
attributes existing in those objects. This results in a lattice structure, where each node
is specified by a set of objects and the attributes they share. The mathematical
formulae of FCA can be considered as a machine learning algorithm which can
facilitate automatic document clustering. A key difference between FCA techniques
and general clustering algorithms in Information Retrieval is that the mathematical
formulae of FCA produce a concept lattice which provides all possible generalization
and specialization relationships between object sets and attribute sets. This means that
a concept lattice can represent conceptual hierarchies, which are inherent in the data
of a particular domain.
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 132 – 143, 2006.
© Springer-Verlag Berlin Heidelberg 2006
A Hybrid Browsing Mechanism Using Conceptual Scales 133
FCA starts with a formal context which is a binary relation between a set of objects and a
set of attributes. It was defined for the document retrieval system that we proposed in the
paper [13] as follows: A formal context is a triple C = (D, K, I) where D is a set of
documents (objects), K is a set of keywords (attributes) and I is a binary relation which
indicates whether k is a keyword of a document d. If k is a keyword of d, it is written dIk
or (d, k) ∈ I.
For the domain of research interests used for experiments in the previous work [13] and
used in this paper, a document corresponds to a home page and a set of keywords is a set
of research topics. That is, D is the set of home pages and K is the set of research topics for
a context (D, K, I). However, the word documents and keywords are also used
interchangeably to denote home pages (or simply pages) and research topics (or simply
topics), respectively in this paper.
From the formal context, formal concepts and a concept lattice are formulated. A
formal concept consists of a pair with its extent and intent. The extension of a concept is
formed by all objects to which the concept applies and the intension consists of all
attributes existing in those objects. These generate a conceptual hierarchy for the domain
by finding all possible formal concepts which reflect a certain relationship between
attributes and objects. The resulting subconcept-superconcept relationships between
formal concepts are expressed in a concept lattice which can be seen as a semantic net
providing “hierarchical conceptual clustering of the objects… and a representation of all
implications between the attributes” [18, pp.493]. The implicit and explicit representation
of the data allows a meaningful and comprehensible interpretation of the information. This
lattice is used as the browsing structure. Fig. 1 shows an example of a lattice and a data
structure for organizing documents in the lattice. More detailed formulae and explanations
of FCA can be found in [9], [13].
Conceptual scaling has been introduced in order to deal with many-valued attributes [8],
[9]. Usually more than one attribute exists in an application domain and each attribute
may have a range of values so that there is a need to handle many values in a context.
`
A Hybrid Browsing Mechanism Using Conceptual Scales 135
Fig. 1. An example of a browsing structure. (a) Lattice structure. (b) Indexing of the lattice.
There are two ways in which we use conceptual scales. Firstly, ontological
attributes can be used where readily available (e.g., person, academic position,
research group and so on). These correspond to the more structured ontological
properties used systems such as Ontoshare[6] and CREAM[12]. The key point of
our approach is flexible evolving ontological information but there is no problem
with using more fixed information if available. We have included such
information for interest and completeness in conceptual scaling. Secondly, a user
or a system manager can also group a set of keywords used for the annotation of
documents. The groupings are then used for conceptual scaling.
The main difference between our approach and conceptual scaling in
TOSCANA is that in our approach all the existing ontological attributes are
scaled up together in the nested structure. On the other hand, in the TOSCANA
system only one attribute (i.e., a scale) can be combined into the outer structure of
an attribute at a time.
136 M. Kim and P. Compton
Table 1. An example of the many-valued context for the domain of research interests
Fig. 2. Partially ordered multi-valued attributes for the domain of research interests
A Hybrid Browsing Mechanism Using Conceptual Scales 137
To explain this in a more formal way, the following definition is provided. For
example, the has-value relation ℜ on the attributes “person” and “position” is:
ℜ = {(academic staff, professor), (academic staff, associate professor), …, (research
staff, research assistant), …, (research student, Ph.D. student), (research student, ME
student)} from Fig. 2. This hierarchy of the many-valued context with the relation
ℜ is scaled into a nested structure using pop-up and pull-down menus.
Fig. 3. Examples of nested structures corresponding to concepts. This shows the outer structure
of the concepts “artificial intelligence” and “artificial intelligence, machine learning”
constructed from a set of home pages and their topics. Numbers in the lattice and in brackets
indicate the number of pages corresponding to the concept of the lattice and the attribute value,
respectively. The nested structure is presented in a hierarchy deploying all embedded inner
structures. The structure is implemented using pop-up and pull-down menus as shown in Fig. 4.
138 M. Kim and P. Compton
Fig. 4. An example of pop-up and pull-down menus for the nested structures of a concept
The page is an HTML file in a standard format including the basic information
about the researcher such as their first name, last name, e-mail address, position and
others. The system parses the HTML file and extracts the values for the pre-defined
attributes. From the attributes and their extracted values, we formulate a nested
structure for a concept of the lattice at retrieval time. Note that the attributes which do
not exist in the default home page can be used for conceptual scaling. The user will
need to provide the values of those attributes when they assign a set of keyword for
their document. For this case, a simple interface to click selection of values or a series
of text boxes to be filled is given to the user.
A user can navigate recursively among the nested attributes observing the
interrelationship between the attributes and the outer structure. By selecting one of the
nested items, the user can moderate the cardinality of the display. Again, the structure
with the most obvious attributes can be partly equivalent to the ontological structure
of the domain and consequently is considered as an ontological browser which is
integrated into the lattice structure with the keywords set.
Fig. 4 shows an example of pop-up and pull-down menus for the nested structure of
①
the concept “artificial intelligence” in Fig. 3. The menu of appears when a user clicks
①
on the concept “artificial intelligence”. Each item of menu is equivalent to a scale in
the many-valued context. Suppose that the user selects the attribute Person in menu , ①
the system then will display a sub-menu of the attribute as shown in menu . ④
3.2 Conceptual Scaling for Grouping Keywords
Conceptual scaling is also applied to group relevant values in the keyword sets used
for the annotation of documents. The groupings are determined as required, and their
scales are derived on the fly when a user’s query is associated with the groupings.
This means that the relevant group name(s) is included into the nested structure
dynamically at run time. Table 2 shows examples of groupings for scales in the one-
valued context for the attribute ‘keyword’. To deal with grouping for scales, the
following definition is provided:
Definition 2. Let a formal context C = (D, K, I) be given. A set G ⊆ K is a set of
grouping names (generic terms) of C if and only if for each keyword k ∈ K, either k ∈ G
or there exists some generic term κ ∈ G such that k is a sub-term of κ. We define S = K
\ G and a relation gen ⊆ G x S such that (g, s) ∈ gen if and only if s is a sub-term of g.
A Hybrid Browsing Mechanism Using Conceptual Scales 139
Table 2. Examples of grouping for scales in the one-valued context for the attribute ‘keyword’
Then, when a user’s query is qry ∈ G, a sub-formal context C′ = (D′, K′, I′) of (D, K,
I) is formulated where K′ ={k ∈ K | k = qry or (qry, k) ∈ gen}, D′ = {d ∈ D | ∃k ∈ K′
and dIk} and I′ = { (d, k) ∈ D′ x K′ | (d, k) ∈ I} ∪ { (d, qry) | d ∈ D′ and qry ∈ K′ ∩
G}. For instance, suppose that there are groupings as shown in Table 2 and a user’s
query “databases”. The query databases ∈ G so that a sub-context C′ is constructed
to include a scale of the grouping name databases and build a lattice of C′. The user
can then navigate this lattice of C′.
Fig. 5 shows an example of a scale with the grouping name “databases”. The
grouping name is embedded into an item of the nested structure along with other
scales from the many-valued context in the previous section. There are 12 documents
with the concept “Databases” in the lattice, and the node (Databases, 12) embeds the
①
scales as shown in menu . The scale “Databases” was derived from the groupings in
the one-valued context, while other scales (items) were derived from the many-valued
context (i.e., ontological attributes). A user can read that there is one document related
to “deductive databases”, and two documents with “multimedia databases” etc. By
②
selecting an item of sub-menu , the user can moderate the retrieved documents
which are only associated with the selected sub-term.
A knowledge engineer/user can set up or change the groupings using a supported
tool (i.e., ontology editor) whenever it is required. When a grouping name with a set
of sub-terms is added, the system gets the set of documents that are associated with at
least one of the sub-terms of the grouping name. Then, the context C is refined to
have a binary relation between the grouping term and the documents related to the
sub-terms of the grouping term. Next, the lattice of C is reformulated when any
change in C is made. If a grouping name is changed, it is replaced with the changed
one in the context C and its lattice.
In the case of removal of a grouping in the hierarchy, no change is made in the
context C. With this mechanism, the outer lattice can always embed a node which can
assemble all documents associated with the sub-terms of a grouping. That is, the.
140 M. Kim and P. Compton
groupings play the role of intermediate nodes in the lattice to scale the relevant values.
Groupings can be formed with more than one level of hierarchy. This means that a sub-
term of a grouping can be a grouping of other sub-terms.
4 Implementation
To examine the value of conceptual scaling, a prototype has been implemented with a
test domain for research topics in the School of Computer Science and Engineering,
UNSW. There are around 150 research staff and students in the School who generally
have homepages indicating their research projects. The aim here was to allow staff
and students to freely annotate their pages so that they would be found appropriately
within the evolving lattice of research topics.
Fig. 6 shows an example of conceptual scaling for ontological attributes. It shows
examples of inner browsing structures corresponding to the concept “Artificial
Intelligence” of the outer lattice. We scale up ontological attributes into an inner nested
structure. The nested structure is constructed dynamically and associated with the
current concept of the outer structure. In other words, the nested attribute values are
extracted from the result pages. A nested pop-up menu appears when the user clicks on
the “nested” icon in the front of the current node. If the user clicks on one of the
attributes items, the results will be changed according to the selection. The user can
navigate recursively among the nested attributes.
For instance, we suppose that the user selects the attributes items Position →
Academic Staff → Professor. The result then will be changed accordingly. The user can
see that there are four researchers whose research topic is “Artificial Intelligence” and
whose position is Professor. Numbers in brackets indicate the number of documents
(i.e., homepages) corresponding to the attribute value.
As well, a knowledge engineer can arrange related terms by accessing a tool which
allows him or her to set up hierarchical grouping related terms under a common name
as described in Section 3.2. Then, when a user’s query is related to the grouping(s), the
grouping name is included into the inner structure on the fly. Fig. 7 shows an example
of conceptual scaling for the grouping “Databases”. Other items (i.e., School,
Research Groups, Position) are derived from the ontological attributes. There are 12
documents with the concept “Databases” in the lattice.
A Hybrid Browsing Mechanism Using Conceptual Scales 141
The user can read that there is one document related to “Mobile databases”, two
documents with “Multimedia databases” and so on. By selecting one of the grouping
members, the user can moderate the retrieved documents which are only associated
with the selected sub-term.
References
1. Benjamins, V. R., Fensel, D., Decker, S., Perez, A. G.: (KA)²: Building Ontologies for
the Internet: a Mid-term Report, International journal of human computer studies (1999)
51(3): 687-712.
2. Carpineto, C., Romano, G.: Information retrieval through hybrid navigation of lattice
representations, International Journal of Human-Computer Studies (1996) 45:553-578.
A Hybrid Browsing Mechanism Using Conceptual Scales 143
3. Cimiano, P., Hotho, A., Stumme, G., Tane, J.: Conceptual Knowledge Processing with
Formal Concept Analysis and Ontologies, Proceedings of the Second International
Conference on Formal Concept Analysis (ICFCA 04), Springer-Verlag (2004) 189-207.
4. Cole, R., Eklund, P.: Browsing Semi-structured Web texts using Formal Concept Analysis,
Conceptual Structures: Broadening the Base, Proceedings of the 9th International
Conference on Conceptual Structures (ICCS 2001), Springer-Verlag (2001) 290-303.
5. Cole, R., Stumme, G.: CEM - A Conceptual Email Manager, Conceptual Structures:
Logical, Linguistic, and Computational Issues, Proceedings of the 8th International
Conference on Conceptual Structures (ICCS 2000), Springer-Verlag (2000) 438-452.
6. Davies, J., Duke, A., Sure, Y.: OntoShare – A Knowledge Environment for Virtual
Communities of Practice, Proceedings of the Second International Conference on
Knowledge Capture (K-CAP 2003), ACM, New York (2003) 20-27.
7. Eklund, P., Groh, B., Stumme, G., Wille, R.: A Contextual-Logic Extension of
TOSCANA, Conceptual Structures: Logical, Linguistic, and Computational Issues,
Proceedings of the 8th International Conference on Conceptual Structures (ICCS 2000),
Darmstadt, Springer-Verlag (2000) 453-467.
8. Ganter, B., Wille, R.: Conceptual Scaling, In: F. Roberts (eds.): Application of
Combinatorics and Graph Theory to the Biological and Social Sciences, Springer-Verlag
(1989) 139-167.
9. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations, Springer-
Verlag, Heidelberg (1999).
10. Godin, R., Missaoui, R., April, A.: Experimental comparison of navigation in a Galois
lattice with conventional information retrieval methods, International Journal of Man-
Machine Studies (1993) 38 :747-767.
11. Groh, G., Strahringer, S., Wille, R.: TOSCANA-Systems Based on Thesauri, Conceptual
Structures: Theory, Tools and Applications, Proceedings of the 6th International
Conference on Conceptual Structures (ICCS’98), Springer-Verlag (1998) 127-138.
12. Handschuh, S., Staab, S.: CREAM – CREAting Methadata for the Semantic Web,
Computer Networks (2003) 242:579-598.
13. Kim, M., Compton, P.: Evolutionary Document Management and Retrieval for Specialised
Domains on the Web, International journal of human computer studies (2004) 60(2): 201-241.
14. Priss, U.: Lattice-based Information Retrieval, Knowledge Organisation (2000) 27(3):132-142.
15. Quan, T.T., Hui, S.C., Fong, A.C.M., Cao, T.H.: Automatic Generation of Ontology for
Scholarly Semantic Web, The Semantic Web – ISWC 2004: Proceedings of the Third
International Semantic Web Conference, Hiroshima, Springer-Verlag (2004) 726-740.
16. Stumme, G.: Hierarchies of Conceptual Scales, 12th Banff Knowledge Acquisition,
Modelling and Management (KAW’99), Banff, Canada, SRDG Publication, University of
Calgary (1999) 5.5.1-18.
17. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts, In:
Ivan Rival (eds.), Ordered sets, Reidel, Dordrecht-Boston (1982) 445-470.
18. Wille, R.: Concept lattices and conceptual knowledge systems, Computers and
Mathematics with Applications (1992) 23:493-515.
Knowledge Representation for Video Assisted by
Domain-Specific Ontology
Abstract. Video analysis typically has been pursued in two different directions.
Either previous approaches have focused on low-level descriptors, such as
dominant color, or they have focused on the video content, such as person or ob-
ject. In this paper, we present a video analysis environment not only to bridge
these two directions but also can extract and manage semantic metadata from
multimedia content autonomously for addressing the interaction between brows-
ing and search capabilities. Concretely speaking, we implemented a tool that links
MPEG-7 visual descriptors to high-level, domain-specific concepts. Our ap-
proach is ontology-driven, in the sense that we provide ontology based domain-
specific extensions of the standards for describing the knowledge of video
content. In this work, we consider one shot (episode) in the billiard game of
video as the specific domain and we will be through the practical works to ex-
plain the process of representation of video knowledge. In the experiment part,
we prove our approach effectiveness by comparing with the video content re-
trieval based on only key-word.
1 Introduction
Although new multimedia standards, such as MPEG-4 and MPEG-7 [1], provide the
needed functionalities in order to manipulate and transmit objects and metadata, their
extraction, and that most importantly at a semantic level, is out of the scope of these
standards and is left to the content developer. Extraction of low-level features and
object recognition are important phases in developing multimedia database manage-
ment systems.
There has been a research focus to develop techniques to annotate the content of
images on the Web using Web ontology languages such as RDF and OWL. Past
efforts have largely focused on mapping low-level image features to ontological
∗
Corresponding author.
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 144 – 155, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Knowledge Representation for Video Assisted by Domain-Specific Ontology 145
concepts [2] and have involved the development of tools that are closely tied to do-
main specific ontologies for annotation purposes [3,4]. Additionally, the lack of pre-
cise models and formats for object and system representation and the high complexity of
multimedia processing algorithms make the development of fully automatic semantic
multimedia analysis and management systems a challenging task. This is due to the diffi-
culty that often mentioned as the semantic gap. The use of knowledge domain is probably
the only way by which higher level semantics can be incorporated into techniques that
capture the semantic concepts. So, in this paper, a comprehensive method for video con-
tent analysis based on the specific knowledge domain was proposed using on the tools of
Protégé which is the classical ontology editor and PhotoStuff that is the most promising
annotation software that allows users to makeup of an image/video key-frame with
respect to concepts in an ontology.
We organize the remainder of the paper as follows: Section 2 is about the overview
for video analysis. Section 3 introduces the infrastructure of domain knowledge. As
the major part, section 4 shows us how to present video content through one specific
domain ontology. It contains two sub-sections: ontology building and mapping from
the low-level features to high-level semantics for video knowledge representation. And,
Analysis results for video content retrieval are showed in Section 5. After these com-
prehensive explanations, we will conclude in section 6.
time or space. And we consider a shot that contains a series of actions that can be
used to express one meaningful event in the video as one Knowledge Domain.
Since there are three frame types (I, P, and B) in a MPEG bit stream, we first pro-
pose a technique to detect the scene cuts occurring on I frames, and the shot bounda-
ries obtained on the I frames are then refined by detecting the scene cuts occurring on
P and B frames. For I frames, block-based DCT is used directly as
cv 7 7
( 2 x + 1) uπ ( 2 y + 1) vπ
∑ ∑ I( x , y) × cos
Cu
F( u , v ) = cos (1)
4 x =0 y =0 16 16
Where
⎧ 1
⎪ for u , v = 0
Cu ,C v = ⎨ 2 (2)
⎪⎩ 1 otherwise
One finds that the dc image [consisting only of the dc coefficient (u=v=0) for each
block] is a spatially reduced version of I frame. For a MPEG video bit stream, a se-
quence of dc images can be constructed by decoding only the dc coefficients of I
frames, since dc images retain most of the essential global information of image
components.
Yeo and Liu have proposed a novel technique for detecting shot cuts on the basis
of dc images of a MPEG bit stream, [5] in which the shot cut detection threshold is
determined by analyzing the difference between the highest and second highest histo-
gram difference in the sliding window. In this article, an automatic dc-based tech-
nique is proposed which adapts the threshold for shot cut detection to the activities of
various videos. The color histogram differences (HD) among successive I frames of a
MPEG bit stream can be calculated on the basis of their dc images as
M
HD ( j, j − 1) = ∑
k =0
[ H j −1 ( k ) − H j ( k )] 2 (3)
where Hj(k) denotes the dc-based color histogram of the jth I frame, Hj–1(k) indicates
the dc-based color histogram of the ( j – 1)th I frame, and k is one of the M potential
color components. The temporal relationships among successive I frames in a MPEG
bit stream are then classified into two opposite classes according to their color histo-
gram differences and an optimal threshold Tc ,
The optimal threshold Tc can be determined automatically by using the fast searching
technique given in Ref. [5]. The video frames ~including the I, P, and B frames. Be-
tween two successive scenes cuts are taken as one video shot. The following figures
have shown us the shot we have detected using the algorithm mentioned above. The
shot has contains a series of I frames.
Knowledge Representation for Video Assisted by Domain-Specific Ontology 147
In order to achieve our aforementioned aims for bridging the chasm existing be-
tween the high and low levels, we propose a comprehensive ontology infrastructure
which is based on the MPEG-7 scheme that was analyzed above, and details will be
described as follows. The summarized knowledge infrastructure can be divided into
two major parts.
One is the domain ontology, in the multimedia annotation framework, is meant to
model the content layer of multimedia content with respect to specific real-world
domains, such as sports events like billiard game which was considered the example
in this paper. We want to extract semantic information from one image but without
a gap between the high-level concept and low-level features, the domain ontology
should be explored. As the figure shows us, the middle part-"Billiard Ontology" in the
knowledge domain of sports plays the important role of "mapping". Ontology is struc-
tured as the middle in Figure 4 shows. It contains some significant classes like Event,
Knowledge Representation for Video Assisted by Domain-Specific Ontology 149
Action, Agent and so on and their corresponding instances, for example, in the class
of Agent, it has the instances like ball, player, and table, etc, also following their
property values we called “low level feature”.
The other part is to represent how the visual characteristics are associated with a
concept [8, 9]. One has to employ several different visual properties depending on the
concept at hand. For instance, in the billiard domain as was described in the scenario
in the aforementioned section, the billiard ball might be described using its shape (e.g.
round), color (e.g. white or red), or in case of video sequences, motion.
The low-level features automatically extracted from the resulting moving objects
are mapped to high-level concepts using ontology in a specific knowledge domain,
combined with a relevance feedback mechanism is the main contribution in this part.
In this study, ontologies [12] are employed to facilitate the annotation work using
semantically meaningful concepts (semantic objects), Figure 6 displays the hierarchi-
cal concepts of the video shot ontology with a great extensibility to describe common
video clips, which has the distinctive similarity of knowledge domain. The simple
ontology gives a structural framework to annotate the key frames within one shot,
using a vocabulary of intermediate-level descriptor values to describe semantic ob-
jects’ actions in video metadata.
The Protégé environment composes these three parts: Asserted Ontology Hierarchy
at which we have defined a billiard game domain ontology files in OWL[13]; Class
Editor; Asserted Conditions and Properties Definitions areas. OWL classes are inter-
preted as sets that contain individuals, such as Action, Event, Agent, etc. They are
described using formal (mathematical) descriptions that state precisely the require-
ments for membership of the class. For example, the class “BilliardGame-Agent”
would contain all the individuals (ball, player, table, stick and audience, etc.) that are
billiard game in our domain of interest. This ontology allows assertions to be made
stating that an image contains a region that depicts certain concepts.
Through this domain ontology, we map from the low-level features to the high-
level semantic concepts for the video knowledge representation. So in the previous
works, a local billiard game ontology has been pre-specified in OWL, defining a
small set of concepts for video key-frames, regions, depictions, etc. And in order to
realize the billiard domain ontology’s function, we explore the Photostuff soft-
ware[10] as our assisted tool for video annotation. Photostuff is a platform
Knowledge Representation for Video Assisted by Domain-Specific Ontology 151
independent (written in Java), image annotation tool which uses an ontology to pro-
vide the expressiveness required to assert the contents of an image, as well informa-
tion about the image (date created, etc.). The annotation works are proceeded as
follows:
Firstly, we load the owl file from our local server which was already defined, then,
we import the key-frame(#52) of billiard game video using the local server directory
too. When these two elements are well prepared, we begin to annotate the objects that
are displayed in the key-frame by specifying the regions of the objects. For example,
if we want to annotate the object of “cue ball” and we just choose the rectangle draw-
ing bar to highlight it and the object’s corresponding properties will appear on the
right side like the figure 7 shows. So, the “instance form” in the figure, we can choose
the properties for this object which are already defined in the former ontology, and the
same methods to other objects in this key-frame. Therefore, the annotation can be
finished semi-automatically assisted by the billiard game ontology.
ontology concepts. After that, we can link the key-frame to the billiard video shot that
was obtained from the segmentation in aforementioned section for video event anno-
tation. To simply view the RDF/XML syntax of the annotations, select Windows
View RDF(like the figure 9 shows). The RDF output from the image markup per-
formed for key-frame annotation on the Semantic Web.
Fig. 10. RDF graph for billiard game key-frame annotation content
A relevant object is an object of use to the user in response to his or her query. Let
us assume that Rel represents the set of relevant objects and Ret represents the set of
retrieved objects. The above measure can also be redefined in the following manner.
154 D. Song et al.
Fig. 12. Recall and precision of ontology-based and key-word based research
6 Conclusions
In this paper, we proposed a novel method for video content analysis and description
on the fundamental of the knowledge domain ontology. In this work we have pre-
sented a generic, domain independent framework for annotating and managing digital
image content using the Semantic Web technologies. The adaptive billiard ontology
not only helps us overcome gap between the low-level features and high level seman-
tics, but combining these two aspects in the most efficient and flexible (expressive)
manner. The proposed approach aims at formulation of a domain specific analysis
model facilitating for semantic video content retrieval. In the experimental part, it
demonstrated that the method achieved a higher average rate of use’s retrieval based
on semantic ontology than using only key-word.
Our future work includes the enhancement of the domain ontology with more com-
plex model representation and especially, the video object description, we try to use
the more complex spatio-temporal relationships rules to analyze the moving features.
We will also do more technical work (improve our retrieval system) to intensify our
retrieval part function.
References
1. S.-F. Chang, T. Sikora, and A. Puri. Overview of the MPEG-7 standard. IEEE Trans. on
Circuits and Systems for Video Technology, 11(6):688–695, June 2001.
2. Dupplaw, D., Dasmahapatra, S.,Hu, B., Lewis, P., and Shadbolt, N. Multimedia Distrib-
uted Knowledge Management in MIAKT. ISWC Workshop on Knowledge Markup and
Semantic Annotation.Hiroshima, Japan, November 2004.
Knowledge Representation for Video Assisted by Domain-Specific Ontology 155
3. Hollink, L., Schreiber, G., Wielemaker J., and Wielinga. B. Semantic Annotation of Image
Collections. In Proceedings of Knowledge Capture - Knowledge Markup and Semantic
Annotation Workshop 2003.
4. Schreiber, G., Dubbeldam, B., Wielemaker, J., and Wielinga, B. Ontology-Based Photo
Annotation. IEEE Intelligent Systems, 16(3) 2001.
5. B. L. Yeo and B. Liu, ‘‘Rapid scene change detection on compressed video,’’ IEEE Trans.
Circuits Syst. Video Technol. 5, 533–544, 1995.
6. M. G. Strintzis S. Bloehdorn S. Handschuh S. Staab N. Simou V. Tzouvaras K. Petridis, I.
Kompatsiaris and Y. Avrithis, “Knowledge representation for semantic multimedia content
analysis and reasoning,” in European Workshop on the Integration of Knowledge, Seman-
tics and Digital Media Technology, November 2004.
7. Multimedia content description interface-part 8: extraction and use of Mpeg-7 descriptors.
ISO/IEC 15938-8, 2002.
8. Kompatsiaris, V. Mezaris, and M. G. Strintzis. Multimedia content indexing and retrieval
using an object ontology. Multimedia Content and Semantic Web - Methods, Standards
and Tools Editor G.Stamou, Wiley, New York, NY, 2004.
9. A.G. Hauptmann. Towards a large scale concept ontology for broadcast video. In Proc. of
the 3rd int conf on Image and Video Retrieval (CIVR’04), 2004.
10. ChristianHalaschek-Wiener, NikolaosSimou and VassilisTzouvaras, “Image Annotation
on the Semantic Web," W3C Candidate Recommendation, W3C Working Draft 22 March,
2006.
11. Matthew Horridge Holger Knublauch and Alan Rector, “A Practical Guide To Building
OWL Ontologies Using The Prot´eg´e-OWL Plugin and CO-ODE Tools Edition 1.0,"
W3C Candidate Recommendation, 10 August 27, 2004.
12. A. T. Schreiber, B. Dubbeldam, J. Wielemaker, and B. Wielinga, “Ontology-based photo
annotation,” IEEE Intell. Syst., vol. 16, pp. 66–74,May-June 2001.
13. T.Adanck, N.O'Connor, and N.Murphy. Region-based Segmentation of Image Using Syn-
tactic Visual Features. In Pro. Workshop on Image Analysis for Multimedia Interactive
Services, WIAMIS 2005, Montreux, Swizerland, April 13-15 2005.
An Ontological Infrastructure for the Semantic
Integration of Clinical Archetypes
Abstract. One of the basic needs for any healthcare professional is to be able to
access clinical information of patients in an understandable and normalized
way. The lifelong clinical information of any person supported by electronic
means configures his Electronic Health Record (EHR). There are currently
different standards for representing EHRs. Each standard defines its own
information models, so that, in order to promote the interoperability among
standard-compliant information systems, the different information models must
be semantically integrated. In this work, we present an ontological approach to
promote interoperability among CEN- and OpenEHR- compliant information
systems by facilitating the construction of interoperable clinical archetypes.
1 Introduction
One of the basic needs for any healthcare professional is to be able to access clinical
information of patients in an understandable and normalized way. The lifelong
clinical information of any person supported by electronic means configures his
Electronic Health Record (EHR). This information is usually distributed among
several independent and heterogeneous systems that may be syntactically or
semantically incompatible. There are currently different standards for representing
electronic healthcare records (EHR). Each standard defines its own information
models and manages the information in a particular way. This implies that clinical
information systems of different clinical organizations might differ in the way
electronic healthcare records are managed. Hence, exchanging healthcare information
among health professionals or clinical information systems is nowadays a critical
process for the healthcare sector. Due to the special sensitivity of medical data and its
ethical and legal constraints, this exchange must be done in a meaningful way,
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 156 – 167, 2006.
© Springer-Verlag Berlin Heidelberg 2006
An Ontological Infrastructure for the Semantic Integration of Clinical Archetypes 157
can play different roles in the healthcare process. Their main use is to support clinical
care. In the last years, different working groups have been actively working in the
definition of architectures and information models for electronic healthcare records.
Each model implies a working environment in which the meaning of data varies. This
requirement is fulfilled by semantic technologies, which make the description of the
nature and logical context of the information to exchange possible, allowing each
system to remain independent. An Electronic Healthcare Record (EHR) is a
healthcare record digitally stored in one or more information systems.
The OpenEHR consortium has developed the dual model architecture approach [1]
for electronic healthcare records. This architecture is based on the metamodelling of
healthcare records, and it is based on the separation of concepts in two levels: (1)
reference model (RM), and (2) archetypes, which are formal models of clinical
concepts. The information system is based on the RM and the valid healthcare records
extracts are instances of this reference model. This methodology was tested in the
Good Electronic Healthcare Record project (GEHR) in Australia and by the European
Synex project. It is also used in the new version of HL7 and in the CEN norm for the
communication of healthcare records.
The reference model represents the global features of the annotations of healthcare
records, how they are aggregated and the context information required to meet the
ethical, legal, etc requirements. This model defines the set of classes that form the
generic building blocks of the electronic healthcare record and it contains the non-
volatile features of the electronic healthcare record. However, the reference model
needs the complement of domain knowledge: archetypes. An archetype models the
common features of types of entities and, therefore, defines the valid domain
structures. Archetypes restrict the business objects defined in a reference model,
bridging the generality of business concepts defined in the reference model and the
variability of the clinical practice. They provide a standard tool to represent this issue.
Archetype instances are expressed in an archetype definition language (ADL) and
they are therefore related to formal archetype model, which is formally related to the
reference model. Although both ADL and archetype models are stable, the individual
archetypes can be modified in order to be adapted to clinical practice.
The work developed in projects such as the previously mentioned GEHR and
OpenEHR suggest that the formalisms for defining archetypes must be based on the
following main technical principles: (1) each archetype is a different and complete domain
concept; (2) archetypes are expressed as restrictions on the reference model; (3) the
granularity of an archetype corresponds to the granularity of a business concept of the
reference model; (4) each business concept can be considered a descriptor of a domain
ontological level; and (5) archetypes have partonomic and taxonomic components. Having
introduced how electronic healthcare records can be represented at conceptual level, let us
describe the two current EHR standards that are based on a dual model approach:
CEN/TC251 EN13606 and OpenEHR on which we are focusing in this research work
both at information level (reference model) and knowledge model (archetypes).
2.1 CEN
activity of one of its working groups is devoted to the standardization of the architecture
and information models for electronic healthcare records. The overall goal of this
standard is to define a rigorous and durable information architecture for representing the
EHR, in order to support the interoperability of systems and components that need to
interact with EHR services: as discrete systems or as middleware components, to
access, transfer, add, or modify health record entries, via electronic messages or
distributed objects, preserving the original clinical meaning intended by the author,
reflecting the confidentiality of that data as intended by the author and patient.
This standard will have five parts: (1) generic information model for communicat-
ing the electronic healthcare record of any one patient (reference model); (2) generic
information model and language for representing and communicating the definition of
individual instances of archetypes (archetype exchange specification:); (3) a range of
archetypes reflecting a diversity of clinical requirements and settings, as a “starter set”
for adopters and to illustrate how other clinical domains might similarly be
represented (reference archetypes and term lists); (4) the information model concepts
that need to be reflected within individual EHR instances to enable suitable
interaction with the security components (security features); and (5) a set of models
built on the above parts and can form the basis of message-based or service-based
communication (exchange models).
This model makes use of the dual approach for communicating the electronic
healthcare record. Here, an archetype is defined as a computable expression of a
clinical domain concept based on a reference model. It is defined through a set of
structured restrictions. Archetypes share the same formalism, but they can be of
different types. Definitional archetypes are part of a standardized, shared ontology.
Non definitional archetypes are locally used and defined by particular institutions to
fulfil particular clinical needs. However, clinical organizations should agree on
common definitions in order to exchange clinical information efficiently. Therefore,
obtaining the mappings between these archetypes might be of interest.
2.2 OpenEHR
by openEHR, the following can be pointed out: life-long EHR; priority to the patient
/ clinician interaction; technology and data format independent; facilitation of EHRs
sharing via interoperability at data and knowledge levels; integration with
any/multiple terminologies; support for clinical data structures: lists, tables, time-
series, including point and interval events; compatibility with CEN 13606, Corbamed,
and messaging systems.
The most important factors that make the integration and interoperability between
systems difficult are the semantic and structural heterogeneity, as well as different
meaning information has in different systems. Hence, our interest is focused on how
semantic technologies, in particular ontologies, may support and promote
interoperability among electronic healthcare records systems.
An ontology can be seen as a semantic model containing concepts, their properties,
interconceptual relations, and axioms related to the previous elements. Furthermore,
ontology has a standard reference model to integrate information known as
knowledge sharing. In practical settings, ontologies have become widely used due to
the advantages they have (see for instance [3]). On the one hand, ontologies are
reusable, that is, a same ontology can be reused in different applications, either
individually or in combination with other ontologies. On the other hand, ontologies
are shareable, that is, their knowledge allows for being shared by a particular
community. In the context of integration, they facilitate the human understanding of
the information. Ontologies allow for differentiating among resources, and this is
especially useful when there are resources with redundant data. Thus, they help to
fully understand the meaning and context of the information. This is important for our
objective of achieving semantic interoperability among electronic healthcare record
systems built on top of different information models. For our purpose, the information
model semantics is formalized by means of ontologies and represented by using the
Ontology Web Language (OWL) [18].
Ontologies have already been used for integration and interoperability purposes in
medical domains. In [8], ontologies were used to promote integration and
interoperability between information systems for three medical communities by
combining data with HL7 [14] and terminologies such as UMLS [15], MEDCIN [17]
and SNOMED [16]. So, terminologies are integrated by using ontologies. Our
approach is different because EHR standards have a different nature and the
components defined in clinical archetypes can be linked to different terminologies.
Therefore, our work can benefit from terminological integration approaches such as
[8] in order to simplify the management of different terminologies at EHR level.
Another example is the joint effort made by the ONTOLOG [19] forum, the Medical
Informatics department of Stanford University and the Semantic Interoperability
Community of Practice (SICOP) [20] to integrate and make the Federal Health
Architecture and the National Health Information Network interoperable. They
defined an three level ontological architecture. The medium level ontologies were
FEA-RMO (Federal Enterprise Architecture- Reference Model Ontology) and HL7
An Ontological Infrastructure for the Semantic Integration of Clinical Archetypes 161
RIM, and different domain ontologies were obtained from the Federal Health
Architecture. This effort was carried out in the context of HL7, so different EHR
models were not targeted as we do in this work.
Figure 1 shows the set of ontologies that configure our ontological architecture to
solve this problem, as well as the relations existing among them. This figure has three
main areas. The left and right areas contain specific information of each reference
model (i.e, CEN, OpenEHR). Each one defines a healthcare record information
model, including a set of types of items. For instance, clusters and elements are types
162 J.T. Fernández-Breis et al.
of CEN items. The OpenEHR is richer in terms of types of items. The definition of
mappings between types of items is possible, although some semantic processing is
needed. These items define the nature of clinical actions that can be represented by
clinical archetypes. The central part of the figure represents the common parts to both
standards. The global reference model is located on top of the figure. This model will
allow us to translate information between both models. The archetype model is
located in the central part of the figure. This is common for both standards, and it
makes use of the ontologies of assertions and primitive data types, which are
semantically equivalent for both standards. Furthermore, the archetype model will be
used to build archetypes, and these will be used to generate the archetype instances
(one per patient). These instances are contained in the extract of the electronic
healthcare record of the patients. The lower part of the figure refers to the types of
clinical items defined in both models.
Hence, two major mappings have to be made between: (1) the types of items in
order to be capable of translating archetypes to a different model; (2) the concepts that
describe the electronic healthcare record in order to translate EHR extracts. This
architecture facilitates different actions, such as:
• Model-independent definition of electronic healthcare records
• Model-independent definition of clinical archetypes
• Automatic translation of EHR extracts from CEN to OpenEHR and viceversa
• Automatic translation of archetypes from CEN to OpenEHR and viceversa
• Semantic interoperability between CEN-based and OpenEHR-based systems
Our work was mainly focused on the reference and archetype models of both CEN
and OpenEHR standards. Each model was represented using OWL to obtain its
ontological representation. Then, the ontological information and archetype models
were compared in order to find similarities and differences among the CEN and
OpenEHR representations. It is more appropriate to perform this comparison at
ontological level due to different reasons. First, ontologies are formal models so that
formal reasoning can be performed on both models. Second, representing both models
using the same formalism provide a common representation framework for the
comparison process. Third, if we want to come to an integrated model, it is more
appropriate to have the components represented with the same formalism and at the
same granularity level.
Hence, let us discuss the conclusions drawn from the analysis of the information and
archetype models. First, the representation of the information models is more oriented to
the transmission via a communication network rather than to representing contents
semantically. In fact, the different model diagrams provided in the documentation of
both standards has little semantic information; they are similar to UML class diagrams.
This can be observed in the archetype model defined in the CEN [12] or OpenEHR
documentation [13]. For instance, there are some references to elements belonging to
different classes modelled by string attributes. This representation may make it harder to
An Ontological Infrastructure for the Semantic Integration of Clinical Archetypes 163
translations of archetype terms to such language. Therefore, each archetype term has a
set of translations associated. Each archetype term has also a definition, and a set of
term bindings to the available terminologies. Figure 2 shows a part of the ontological
representation of the archetype model. This part of the ontology reflects common
information to CEN and OpenEHR. Archetype terms can refer to restrictions and
conceptual entities. Conceptual terms (called ontology terms) are divided into concepts
(e.g., heart rate), complex terms (e.g., list, history), simple terms (e.g., position, device),
or values (e.g., sitting, lying). A simple term has a set of values associated. Each
complex term is comprised of a set of complex and simple terms. Values are of a
particular datatype, which is given by the reference model (CEN / OpenEHR). Both
standards use the same basic datatypes, but have different simple and complex terms.
This part of the ontology is shown in Figure 3. Besides the modeling of this integrated
archetype model, the reference model of the CEN and OpenEHR standards have been
ontologically modeled. For this purpose, the procedure was similar to the one followed
for the archetypes model. They were analyzed in order to detect semantic representation
flaws and OWL schemes were developed. Then, both ontologies were semantically
compared in order to look for mapping between both standards to develop an integrated
model for the electronic healthcare records. The main difference is the richness for
defining types of clinical data. The CEN model makes use of folders, sections, entries,
items, clusters and elements, whereas the OpenEHR model uses a wider range of types,
including some such as history, item list, item structure, and so on. We are currently
completing this mapping, which would allow to transform automatically CEN electronic
healthcare records and archetypes into OpenEHR ones and viceversa. Our research group
has been working of the modelling of the different standards and definition of the
mappings for the last year. The set of OWL ontologies and mappings obtained through
this research project are available at http://klt.inf.um.es/~poseacle.
A “problem” has a “structure” and a “protocol”. The structure of a problem has a
description of the problem, a “date of initial onset”, “age at initial onset”, “severity”,
“clinical description”, “date clinically recognised”, “location”, “aetiology”, “occurrences
or exacerbations”, “related problems”, “date of resolution”, and “age at resolution”.
These archetype terms must be assigned a type of term. For this purpose, a set of types
are available, those belonging to OpenEHR and CEN. Provided that CEN types are a
subset of OpenEHR ones, the latter ones can be used as the ones proposed to the
archetype builder and then, these can be easily mapped onto CEN types by considering
complex OpenEHR types as CEN Cluster and the singles ones as CEN Element. Let us
consider the definition of the “protocol”. A protocol is comprised of a set of “references”,
having each a “reference” and a “web link”. The following code of the protocol is shown
in ADL, where ITEM_TREE, CLUSTER, and ELEMENT refer to the corresponding
OpenEHR types., at00xx stands for the terms associated to the entities.
protocol matches {
ITEM_TREE[at0032] matches { -- Tree
items cardinality matches {0..*; unordered} matches {
CLUSTER[at0033] occurrences matches {0..1} matches { -- References
items cardinality matches {0..*; unordered} matches {
ELEMENT[at0034] occurrences matches {0..*} matches { --
Reference
value matches { TEXT matches {*} } }
ELEMENT[at0035] occurrences matches {0..*} matches { --
Web link
value matches { URI matches {*} }}}}}}}
6 Conclusions
Acknowledgements
This work has been possible thanks to the Spanish Ministry for Science and Education
through the projects TSI2004-06475-C01 and TSI2004-06475-C02; and
FUNDESOCO through project FDS-2004-001-01.
References
1. Beale, T.: Archetypes, Constraint-based Domain Models for Future-proof Information
Systems (2001) Available: http://www.deepthought.com.au/it/archetypes/ archetypes.pdf
2. Beale, T. and Heard, S. The openEHR Archetype System (2003) Available:
http://www.openehr.org/
3. Fernández-Breis, J.T., Martínez-Béjar,R.: A Cooperative Framework for Integrating
Ontologies. International Journal of Human-Computer Studies, 56(6) (2002) 662-717
4. Linthicum, D.: Leveraging Ontologies: The Intersection of Data Integration and Business
Intelligence Part I . DMR Review Magazine (2004) June.
5. Missikoff, M.: Harmonise: An Ontology-based Approach for Semantic Interoperability.
ERCIM News 51 (2002)
An Ontological Infrastructure for the Semantic Integration of Clinical Archetypes 167
Abstract. Most common applications using neural networks for control prob-
lems are the automatic controls using the artificial perceptual function. These
control mechanisms are similar to those of the intelligent and pattern recogni-
tion control of an adaptive method frequently performed by the animate nature.
Many automated buildings are using HVAC(Heating Ventilating and Air Con-
ditioning) by PI that has simple and solid characteristics. However, to keep up
good performance, proper tuning and re-tuning are necessary. In this paper, as
the one of method to solve the above problems and improve control perform-
ance of controller, using reinforcement learning method for the one of neural
network learning method(supervised/unsupervised/reinforcement learning), re-
inforcement learning controller is proposed and the validity will be evaluated
under the real operating condition of AHU(Air Handling Unit) in the environ-
ment chamber.
1 Introduction
Although modern control theory has been rapidly developing, most industrial control-
lers of air conditionings and refrigerators, etc use the controller type of
PID(Proportional Integral Derivative). In spite of having a simple structure, PID con-
troller is the most used for industrial processor control because it is well-functioned
with stability and tenacity of the goal following, invasions from the outside and the
process variables. In addition, it experimentally extracts dynamics of the plant by
using several tuning methods. It is able to design the controller by searching variable
for the optimum control[1,2,3].
However, it cannot estimate control performance in advance if there is uncertainty
of process model or a change of operation environment. It is necessary to have an
*
This paper has been supported by the 2006 Hannam University Research Fund.
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 168 – 176, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Improvement of AHU Control Performance Using Reinforcement Learning 169
exact estimate of control environment that changes at any time and automatic tuning
function, especially, in the case of automatic building control that only operates by
typical non-linear structure. In order to choose the parameter of PID controller to get
this optimum control function, there have been a lot of studies about tunings of PID
controller such as the Ziegler-Nichols[4] tuning in 1942 and the tuning by relayed
experiments by Astrom and Hagglund, etc, and a few methods which had improved
from the Ziegler-Nicholas tuning are used[5,6,7,8,9].
Therefore, this study has designed the optimum building cooperating controller by
using Q-Learning to solve these problems above and improve control function of the
controller. Q-Learning was developed by free model reinforcement learning on the
based of probable dynamic programming. It has applied to the building cooperating
system of artificial climate laboratory inside where it is able to control freely outdoor
temperature artificially.
2 Reinforcement Learning
Fig. 2. shows the summary that combines the structure of reinforcement learning with
PI control algorithm. As we studied from Fig. 1. and Fig. 2., reinforcement learning
achieves on-line study through two structural elements. The actor by the given environment
acts suitable action to the environment and the environment by the given action sends
a judgement on that if the changed state and action were right to the actor with rein-
forcement signal that is a reward for the output of PI controller.
Disturbance Process
Signals Variables
Plant
Control
Signals
+
PI Controller
+
3 Experimental Devices
I built a test house in the artificial climate lab building in order that I could experi-
ment overall such as the load of air-conditioning and heating of the building, the effi-
ciency of air-conditioning and heating, thermal environment, energy saving, heat
transfer of the building structure, Wall thermal mass effects, HVAC control, Access
floor control, and so on.
Improvement of AHU Control Performance Using Reinforcement Learning 171
R.F
AI AI
AO
AO
AO AI
Damper
actuator AHU
Controller AI
AI
AO
AO AO S.F
AI H C H
C C C
: Control signal AO zone
: Sensor signal pressure
: Temperature sensor controller
: Pressure sensor
AO : Analog output
AI : Analog input
Fig. 4. shows the composition of supervisory operating control system for auto-
matic operating of cooperation system of the test house. The system is composed of
established supervisory operating control system and supervisory operating control
system for control algorithm performance experiment separately. Established supervi-
172 S. Youk et al.
sory operating control system makes supervisory control of the main computer and
local loop control perform real-time data supervisory and operating control through
data interface with Ethernet TCP/IP, but it is limited to experiment performance of
actual various control algorithm. So, supervisory operating control system has been
embodied which can compare and analyse performance specifics through control
algorithm development and application, and controller corroborative experiment by
composing independent data interface and supervisory control system and performing
automatic control.
4 Result of Experiment
In order to compare and analyse control performance specifics by corroborative ex-
periment of reinforcement learning controller, it performed performance experiment
by using VAV AHU in the test building compared to established PID controller. In
the performance experiment, heating coils’ the control performance experiment of
heating coils for supplied air temperature control has been performed to examine
application possibility on the real system before it is applied to whole system. Here
are the conditions for the experiment: the temperature of the outside air was
-1℃~ ℃ 0 , temperature change of mixed air temperature of the test building and
℃
supplied air temperature was 22 < Tma(temperature of mixed air) < 28 , 33 < ℃ ℃
℃
Tsa(temperature of supply air) <43 . With these conditions, the system was operated
and it performed the controller performance experiment. Before performance experi-
ment, in order for the system to operate more stable and précised control, it used the
optimum control variable of PI controller, that are a comparison element Kp, an inte-
gral element Ki, by using the tuning method of Ziegler - Nichols[3] and testing various
types of loops. Reinforcement learning controller was added to PI controller, then it
used the PI controller that has a comparison element Kp=1.9, an integral element
Ki=7.5 to remove heating coils and designed the controller by operating control per-
formance experiment and using output reward control signal of reinforcement learn-
ing controller.
It decides 7 of scattered output signals: [-2, -1, -0.2, 0, 0.2, 1, 2] as output reward
control signals of reinforcement learning(RL) and sends them with output control
signals of PI controller. Each input variable of 3 divided into 8 of space limits and 3-
demention input space has been fixed 73(343). Each input space stores 7 scattered
input signals at Q-value. This is a reinforcement learning equation as below.
Fig. 5. Control performance when the selected action of RL agent is added to PI controller
controller after completing learning. The optimum controller with complete learning,
that is, if RL controller is used, there is a big decrease of normal state error compared
to PI controller when a set point for the supply air temperature changes according to
the change of outside environment, and an improvement of controller’s performance
that has quick respond.
Fig. 7. shows the control performance specific of the heating coils both in the case of
using PI controller which has control variables Kp= 1, Ki= 4 with increased rising time
174 S. Youk et al.
a bit and the case of applying PI and RL controller. It is clear to be seen there is a im-
provement of control performance like normal state error decrease and quicker respond
than PI controller when supply air temperature changes according to environment
change. Also, the more times of learning routine, the less performance lowering of
learning errors, it is certain that the optimum control of heating coils is possible.
Fig. 8. shows when it used a RL controller with learning error and a PI controller,
if you compare the performance, the respond is a bit improved. However, normal state
error increased far more than using only PI controller because shaking status quo on
the output control signal of heating coils happened by the matter of convergence near
the boundary value of the input variable for learning.
To apply the controller which RL controller is applied to actual system through this
experiment, first, there would be enough learning, appreciate selection of input vari-
ables according to environment change, a boundary range for each variable, and effec-
tive selection of output reward signal value of each system to develop the controller
with optimum performance.
Improvement of AHU Control Performance Using Reinforcement Learning 175
5 Conclusion
As the result of experiment that PI controller and Reinforcement Learning controller
applied to actual tuning system, the more increasing times of learning routine, RL
controller increases or decreases output of PI controller when the temperature value of
supply air changes according to the change of outside environment. It does not de-
crease normal state error between a set point and a temperature of supply air. However,
it is clear that it has a rather quick respond so that output control signal of heating coils
improves considerably as it following-controls a set point.
In order to apply the controller which RL controller connects to actual system,
there should be enough learning, appropriate selection of input variables, space range
of each variable and establishment of control signal value of output reward of each
system effectively. Otherwise, there could cause a place where the boundary section
condition of variables or no learning achieved, or would not converge because of the
happening of shaking status from learning error. Therefore, it could cause deteriora-
tion of controller’s performance. However, it could be the controller with optimum
performance on the purposes of each system if there would be appropriate input vari-
ables, reward output signals, and the range of learning times.
The controller with Reinforcement Learning control algorithm has a quick adaptabil-
ity to environment changes so that it is able to improve the performance of controller.
References
1. Virk, G. S. and Loveday, D. L.: A Comparison of Predictive, PID, and On/Off Techniques
for Energy Management and Control, Proceedings of ASHRAE (1992) 3-10.
2. Åström, K. J. and Hägglund, T.: PID controllers: Theory, design and tuning, Research-
Triangle Park, NC: Instrument Society of America (1995)
3. Hang, C. C. and Åström, K. J. and Ho, W. K.: Ziegler-Nichols tuning formula., Vol. 138.
IEE Proc. D, No.2, 111-118.
4. Ziegler, J. G. and Nichols, N. B.: Optimum settings for automatic controllers, Trans.
ASME (1942) 433-444
5. Hang, C. C., LIM, C. C. and Soon, S. H.: A new PID auto-tunner design based on correla-
tion technique, Proc. 2nd Multinational Instrumentation Conf., China(1986)
176 S. Youk et al.
6. Hang, C. C. and Åström, K. J.: Refinements of the Ziegler Nichols tunning formula for
PID auto-tunners, Proc. ISA Conf., USA
7. Åström, K. J., Hang, C. C., Persson, P., Ho, W. K.: Towards Intelligent PID Control, In-
ternational Federation of Automatic Control(1991)
8. Åström, K. J. and C. C. Hang and P. Persson.: Heuristics for assessment of PID control
with Ziegler-Nichols tuning, Automatic Control, Lund Institute of Technology, Lund,
Sweden(1988)
9. Åström, K. J., and Hagglund, T.: Automatic tuning of simple regulators with specifications
on phase and amplitude margins, Automatica 20 (1984) 645-651.
10. Sutton, R. S.: Learning to predict by the methods of TD(temporal differences). Machine
Learn. 3, (1988) 9-44.
11. Anderson, C. W.: Q-learning with hidden-unit restarting, In Advances in Neural informa-
tion processing system 5, (1993) 81-88.
12. Barto. A. G. and Bradtke, S. J. and Singh, S. P.:, Learning to act using real-time dynamic
programming, Artificial Intelligence 72, (1995) 81-138.
13. Watkins, C. J. and DAYAN, P.: Q-learning, Machine Learning 8 (1992) 279-292.
Optimizing Dissimilarity-Based Classifiers Using
a Newly Modified Hausdorff Distance
Sang-Woon Kim
1 Introduction
One of the most recent and novel developments in the field of statistical Pattern
Recognition (PR) [1] is the concept of Dissimilarity-Based Classifiers (DBCs)
proposed by Duin and his co-authors (see [2], [3], [4], [5], [7]). Philosophically,
This work was generously supported by the KOSEF, the Korea Science and Engi-
neering Foundation (F01-2006-000-10008-0).
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 177–186, 2006.
c Springer-Verlag Berlin Heidelberg 2006
178 S.-W. Kim
the motivation for DBCs is the following: If we assume that “Similar” objects
can be grouped together to form a class, the “class” is nothing more than a set
of these “similar” objects. Based on this idea, Duin and his colleagues argue
that the notion of proximity (similarity or dissimilarity) is actually more funda-
mental than that of a feature or a class. Indeed, it is probably more likely that
the brain uses an intuitive DBC-based methodology rather than that of taking
measurements, inverting matrices etc. Thus, DBCs are a way of defining classi-
fiers between the classes, which are not based on the feature measurements of
the individual patterns, but rather on a suitable dissimilarity measure between
them. The advantage of this methodology is that since it does not operate on the
class-conditional distributions, the accuracy can exceed the Bayes’ error bound
and actually attempt to attain the zero-error bound1 - which is, in our opinion,
remarkable. Another salient advantage of such a paradigm is that it does not
have to confront the problems associated with feature spaces, such as the “curse
of dimensionality” [1], and the issue of estimating a large number of parameters.
However, DBCs have several problems to be solved when being applied for
particular tasks, such as face recognition [11]. The questions encountered in
designing the DBCs are summarized as follows: (1) How to select (or create)
prototype subsets from the training samples. (2) How to measure the dissimilar-
ities between object samples. (3) How to design a classifier in the dissimilarity
space. The existing strategies that have been investigated to answer the above
questions are described in the following.
First of all, using all of the input vectors as prototypes is a simple way to select
prototype subsets. However, in most cases, this will impose a computational bur-
den on the classifier. Recently, Duin and his colleagues [3], [7] discussed a number
of methods including Random, Random C, KCentres, where a training set, T ,
is pruned to yield a set of representative prototypes, Y , where, without loss of
generality, |Y | ≤ |T |. Additionally, by invoking a Prototype Reduction Scheme
(PRS), Kim and Oommen [9] also obtained the representative subset, Y , which
is utilized by the DBC. Apart from utilizing PRSs, in [9], they have also pro-
posed simultaneously employing the Mahalanobis distance as the dissimilarity-
measurement criterion to increase the DBC’s classification accuracy.
Secondly, regarding the second question, investigations have focused on mea-
suring the appropriate dissimilarity using various Lp Norms (including the Eu-
clidean and L0.8 ), the Hausdorff and Modified Hausdorff norm, traditional PR-
based measures, such as those used in Template matching, and Correlation-based
analysis. The final question refers to the learning paradigms, especially those
which deal either with non-metric or non-Euclidean. Since dissimilarity repre-
sentations are interpreted in all vector spaces, the tools available for the feature
representations may be used for learning from the dissimilarity representation as
well. The literature [5], [6] report the use of many traditional decision classifiers
1
The idea of the zero-error bound is based on the fact that dissimilarities may be
defined such that there is no zero distance between objects of different classes. Con-
sequently the classes do not overlap, and so the lower error bound is zero. We are
grateful to Bob Duin for providing us with insight into this.
Optimizing Dissimilarity-Based Classifiers 179
2 Dissimilarity-Based Classification
Hausdorff Norm and its variants, which involve Max-Min computations, and
(c) Traditional pattern recognition norms, such as the Template matching and
Correlation Norms. The details of the other measures, such as the Median and
Cosine, are omitted here in the interest of compactness, but can be found in [5].
In this method, the authors of [20] compute not only the distance between
the point at in the finite point set A and the same value bt in the finite point
set B, but also the distance between the at and its two neighbor values bt−1 and
bt+1 in the finite point set B. Then, these three distances are minimized.
where at ∈A w(at ) = Nat ; and
In Eq. (9), k is the number of gray-levels, which are in the neighborhood of the
gray-level t. For instance, k = 2 means that one needs to compute the distances
between the point at and its four neighbor values, bt−2 , bt−1 , bt+1 , and bt+2 ,
as well as the distance between the point at and the same value bt , and then
minimize these five distances.
In order to apply the above Hausdorff distance, WGHD, to the construction
of the dissimilarity matrix of DBCs, an input image is first translated into a
multi-level gray image. This translation can be performed by invoking a thresh-
olding algorithm, such as Wu’s multi-level thresholding algorithm [21], where
the number of thresholding levels to separate the image into segmented ones
is determined by measuring the separability of the homogenous objects in the
image. The details of the algorithm are omitted here, but can be found in [21].
4 Experimental Results
The proposed method has been tested and compared with conventional meth-
ods. This was done by performing experiments on a well-known benchmark face
database, namely, the AT&T database The face database captioned “AT&T”,
formerly the ORL database of faces, consists of ten different images of 40 distinct
subjects for a total of 400 images. Each subject is positioned upright in front of
a dark homogeneous background. The size of each image is 112 × 92 pixels for a
total dimensionality of 10304.
To construct the dissimilarity matrix, we first selected all training samples
as the representative set. Then, we measured the dissimilarities between each
sample and the prototypes with the dissimilarity measuring methods, such as the
Hausdorff Distance (HD), the spatially Weighted Hausdorff Distance (WHD),
the Gray-level Hausdorff Distance (GHD), and the spatially Weighted Gray-
level Hausdorff Distance (WGHD). Since the construction of the dissimilarity
matrix is a very time-consuming job, in this experiment, we constructed a 50×50
dissimilarity matrix, instead of 400× 400 matrix, after selecting five objects from
the AT&T database 7 .
7
We experimented with the simpler 50 × 50 matrix here. However, it should be men-
tioned that we can have numerous solutions, depending on the representative se-
lection, the dissimilarity measure, and the design of classifiers in the dissimilarity
space. Especially, regarding the dissimilarity measure, it is possible to construct the
dissimilarity matrix rapidly by employing a computational technique. From this per-
spective, we are currently investigating how the experiment with the full 400 × 400
matrix can be performed at a high speed.
184 S.-W. Kim
In the HD and WHD experiments, an input gray image was translated into
a binary edge image by invoking an algorithm, in which edges were found by
looking for “zero crossings” after filtering the image. Also, in the WHD and
WGHD experiments, the weighting function of Eq. (5), w(x), was defined as
1; 0.5; 0; by using two thresholds obtained from the mean face, which is achieved
by simply averaging all the training images.
In this paper, all experiments were performed using the “leave-one-out” strat-
egy. To classify an image of object, that image was removed from the training
set, and the dissimilarity matrix was computed with the n − 1 images. Following
this, all of the n images in the training set and the test object were translated
into a dissimilarity space using the dissimilarity matrix, and recognition was
performed based on the Nearest Neighbor (NN) rule. We repeated this n times
for every sample and obtained a final result by averaging them.
In order to investigate the advantage gained by utilizing the proposed method,
the following experiments have been conducted: First of all, the fifty facial images
of five individuals were randomly selected from AT&T database. Next, for the
facial images, the classification performance was evaluated with the HD, WHD,
GHD, and WGHD methods. After repeating the above two steps ten times, the
performances were finally averaged.
Table 1 shows a comparison of the averaged classification accuracy rates (%)
and the averaged processing CPU-times (seconds) for the fifty facial images.
Here, the fifty images have been translated into binary images for HD and WHD
methods. For GHD and WGHD methods, however, the ordinary 256 gray-level
input images have been used without any translation. Also, k = 8 was employed
as the number of gray-levels to be referred when computing the distance in (9).
From Table 1, it should be noted that it is possible to improve the perfor-
mance of DBCs by effectively measuring the dissimilarity. This improvement can
be seen by observing how the classification accuracy rates (%) and the process-
ing CPU-times (seconds) change. The results from Table 1 demonstrate that the
classification accuracies of WHD and WGHD are improved from those of HD and
GHD, respectively, while there is “marginally” change in the processing CPU-
times. From the table, it is also clear that the classification accuracy of DBCs is
the highest, while its standard deviation (σ) is the lowest when WGHD is used.
In review, among the four methods, it is not easy to crown one particu-
lar method with the superiority over the others in solving the dissimilarity
Table 1. A comparison of the averaged classification accuracy rates (%) and the av-
eraged processing CPU-times (seconds) of the Dissimilarity-Based Classifiers (DBCs).
The processing CPU-time of the second row is presented as an exponential form. For
example, 3.45e3 = 3.45 × 103 . Also, the numbers represented in the brackets of each
row are the standard deviations. The details of the table are discussed in the text.
5 Conclusions
In this paper, a method that seeks to optimize Dissimilarity-Based Classifiers
(DBCs) by using a newly modified Hausdorff distance, the spatially Weighted
Gray-Level Hausdorff Distance, was considered. To construct the dissimilarity
matrix of DBCs, the dissimilarity was measured directly from the input gray-
level image without extracting the binary edge image from it. Thus, the problems
caused by the binary edge map could be overcome. Also, instead of obtaining
the distance on the basis of the entire image, we employed the spatially weighted
mask by which the entire image region was divided into several subregions ac-
cording to their importance.
The proposed method has been tested on a well-known face database and com-
pared with conventional methods. The experimental results demonstrated that
the proposed scheme is better than conventional ones in terms of the classifica-
tion accuracies. Although this paper has shown that DBCs could be optimized
with the proposed Hausdorff distance, many tasks remain unchallenged. One of
them is to improve the classification efficiency by designing a suitable classi-
fier (i.e., linear or, possibly, quadratic classifier) in the dissimilarity space. The
research concerning this is a future aim of the authors.
References
1. A. K. Jain, R. P. W. Duin, J. Mao.: Statistical pattern recognition: A review.
IEEE Trans. Pattern Anal. and Machine Intell., PAMI-22(1) 4–7 2000.
2. R. P. W. Duin, D. Ridder, D. M. J. Tax.: Experiments with a featureless approach
to pattern recognition. Pattern Recognition Letters, 18 1159–1166 1997.
3. R. P. W. Duin, E. Pekalska, D. de Ridder.: Relational discriminant analysis. Pat-
tern Recognition Letters, 20 1175–1181 1999.
4. E. Pekalska, R. P. W. Duin.: Dissimilarity representations allow for buiilding good
classifiers. Pattern Recognition Letters, 23 943–956 2002.
5. E. Pekalska.: Dissimilarity representations in pattern recognition. Concepts, theory
and applications. Ph.D. thesis, Delft University of Technology, Delft, The Nether-
lands, 2005.
6. Y. Horikawa.: On properties of nearest neighbor classifiers for high-dimensional
patterns in dissimilarity-based classification. IEICE Trans. Information & Systems,
J88-D-II(4) 813–817 2005.
7. E. Pekalska, R. P. W. Duin, P. Paclik.: Prototype selection for dissimilarity-based
classifiers. Pattern Recognition, 39 189–208 2006.
8. R. P. W. Duin.: Personal communication.
9. S. -W. Kim, B. J. Oommen.: On optimizing dissimilarity-based classification using
prototype reduction schemes. This paper will be presented at the ICIAR-2006, the
2006 International Conference on Image Analysis and Recognition, in Povoa de
Varzim, Portugal, in September 2006.
186 S.-W. Kim
10. S. -W. Kim.: On On using a dissimilarity representation method to solve the small
sample size problem for face recognition. This paper will be presented at the Acvis
2006, Advanced Concepts for Intelligent Vision Systems, in Antwerp, Belgium, in
September 2006.
11. P. N. Belhumeour, J. P. Hespanha, D. J. Kriegman.: Eigenfaces vs. Fisherfaces:
Recognition using class specific linear projection. IEEE Trans. Pattern Anal. and
Machine Intell., PAMI-19(7) 711–720 1997.
12. H. Yu, J. Yang.: A direct LDA algorithm for high-dimensional data - with appli-
cation to face recognition. Pattern Recognition, 34 2067–2070 2001.
13. P. Howland, J. Wang, H. Park.: Solving the small sample size problem in face
reognition using generalized discriminant analysis. Pattern Recognition, 39 277–
287 2006.
14. J. C. Bezdek, L. I. Kuncheva.: Nearest prototype classifier designs: An experimental
study. International Journal of Intelligent Systems, 16(12) 1445–11473 2001.
15. B. V. Dasarathy.: Nearest Neighbor (NN) Norms: NN Pattern Classification Tech-
niques. IEEE Computer Society Press, Los Alamitos, 1991.
16. S. -W. Kim, B. J. Oommen.: Enhancing prototype reduction schemes with LVQ3-
type algorithms. Pattern Recognition, 36 1083–1093 2003.
17. S. -W. Kim, B. J. Oommen.: Enhancing prototype reduction schemes with recur-
sion : A method applicable for “large” data sets. IEEE Trans. Systems, Man, and
Cybernetics - Part B, SMC-34(3) 1384–1397 2004.
18. D. P. Huttenlocher, G. A. Klanderman, W. J. Rucklidge.: Comparing images using
the Hausdorff distance. IEEE Trans. Pattern Anal. and Machine Intell., PAMI-
15(9) 850–863 1993.
19. B. Guo, K. -M. Lam, K. -H. Lin, W. -C. Siu.: Human face recognition based on
spatially weighted Hausdorff distance. Pattern Recognition Letters, 24 499–507
2003.
20. C. Zhao, W. Shi, Y. Deng.: A new Hausdorff distance for image matching. Pattern
Recognition Letters, 26 581–586 2005.
21. B. -F. Wu, Y. -L. Chen, C. -C. Chiu.: A discriminant analysis based recursive
automatic thresholding approach for image segmentation. IEICE Trans. Inf. &
Syst., E88-D(7) 1716–1723 2005.
A New Model for Classifying DNA Code
Inspired by Neural Networks and FSA
School of Computing
University of Tasmania
Private Bag 100, Hobart
Tasmania 7001, Australia
{BHKang, Andrei.Kelarev, Arthur.Sale, R.Williams}@utas.edu.au
www.comp.utas.edu.au/users/{bhkang/,kelarev/,ahjs/,rwilliams/}
1 Introduction
Classification of data is important in data mining, see [39]. The results of this
paper make the very first essential and rather non-trivial step of work on IRGS
grant allocated by the University of Tasmania for the development and inves-
tigation of new Artificial Intelligence methods for classification of DNA data
collected by the School of Plant Science and CRC for Sustainable Production
Forestry. This is why we are mainly interested in DNA sequences, and we record
all new definitions in this case. In fact, the results and concepts of this note are
applicable to larger classes of problems and can be used to classify texts and
documents, see for example [3], [4], [11], [21], [22], [23], [24], [27], [28], [29], [30],
[32], as well as sequences in datasets of various other kinds too.
The applications of neural networks to solving numerous practical tasks have
been very well known. Many useful results have been obtained with neural net-
works in various applied branches. For the purposes of classifying DNA sequences
it is impossible to use neural networks directly processing the sequences of nu-
cleotides. As a guide we have to look at another very well known concept of a
finite state automaton (FSA) used for analysing sequences. We refer to [6], [9],
[13], [14], [15], [16], [17], [19], [20], [31], [33], [34], [38] for background and some
relevant recent results on the subject. The first aim of the present paper is to
generalize the architecture of neural networks in order to encompass all FSA in
a new concept.
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 187–198, 2006.
c Springer-Verlag Berlin Heidelberg 2006
188 B. Kang et al.
2 Preliminaries
We use standard concepts concerning graphs and algorithms, following [2] and
[35]. Throughout the word ‘graph’ will mean a directed graph, which is allowed
to have multiple edges and loops. We refer to [8] for preliminaries on algorithms
for computational analysis of DNA sequences.
Let us refer to the monographs Baldi and Brunak [1], Durbin, Eddy, Krogh
and Mitchison [5], Jones and Pevzner [10] and Mount [26] for preliminaries on
bioinformatics. Here we briefly recall that every DNA molecule is a double helix
consisting of two strands. Each strand is a sequence of 4 nucleotides or bases:
A (adenine), C (cytosine), G (guanine), and T (thymine). According to the
Watson-Crick complementarity each nucleotide in one strand is crosslinked to
a complementary nucleotide in another strand, and together they form a base
pair. For example, the human genome contains about 3 billion base pairs and
about 35,000 genes. In each DNA molecule, A and T always complement each
other: A in one strand is linked to T in the second spiral. Similarly, C and G
complement each other: C in one spiral is always linked to G in another strand.
A New Model for Classifying DNA Code 189
If we know one sequence, it’s easy to determine its complement. Therefore the
sequence of base pairs in every DNA molecule can be represented with just one
string over the alphabet of four letters A,C,G,T. In this paper we consider the
problem of classifying strings of letters over the alphabet
X = {A, C, G, T }.
Accordingly, the set of all DNA sequences is precisely the set X ∗ of all strings
over X.
3 Main Notion
A classifier CL(V, E, , r) is a quadruple
where V = {v1 , . . . , vn } is the set of vertices and E is the set of edges of a graph
G = (V, E) with multiple edges allowed and with each edge e labeled by a letter
(e) of the alphabet X and a real number r(e). In other words, there are two
functions
: E → X and r : E → IR. (2)
The state (or current state) of the classifier CL(V, E, , r) is a labeling of all
vertices by real numbers, i.e., a function
s : V → IR. (3)
Notice that our model has some similarities with the concept of a finite state
automaton and that of a neural network, but is different from them.
The classifiers CL(V, E, , r) potentially can be used for both classification and
clustering. A classification of any given set of DNA sequences is a partition of
these sequences into several classes. Classifiers obtain classifications via various
algorithms for supervised learning. In this way the classification is known for
the given set of data. The problem is to construct a classifier that will produce
this classification, so that it can then be used to determine class membership of
new sequences. Initial partition is usually communicated by a supervisor to a
machine learning process constructing the classifier. A different problem is that
of clustering data. It deals with dividing a set of given sequences into classes
not known initially, but determined according to certain measures of similarities
between sequences. This is usually accomplished via a process of unsupervised
learning, see [39].
Now suppose that we want to use the classifier CL(V, E, , r) to analyse a
DNA sequence
x1 , x2 , . . . , xN , (4)
where x1 , . . . , xN ∈ X. The initial state s0 : V → IR can be chosen arbitrarily
depending on practical implementation. Then we use the labeled graph to recur-
sively process all letters of the sequence (4) and modify the state of the graph.
190 B. Kang et al.
Suppose that after we have considered the first i ≥ 0 letters of (4) the state of
the graph is
si : V → IR.
Then we can determine the next state si+1 with recursion
si+1 (v) = r((w, v))si (w). (5)
w∈V,(w,v)∈E
After the whole sequence (4) has been processed, for every vertex v ∈ V , we
know the final value sN (v) ∈ IR.
Let us now define the standard partitions which we are going to use in classi-
fication of DNA sequences. The following standard partitions will be associated
with the classifier CL(V, E, , r). For every 1 ≤ k ≤ N , we define the classifica-
tion Kk as the one which divides all given DNA sequences into classes C1 , . . . , Ck ,
(k)
by including the sequence (4) into the class Ci = Ci , where i is chosen so that
1 ≤ i ≤ k, and
sN (vi ) = max{sN (v1 ), . . . , sN (vk )}.
Obviously, for k > 1, every classification Kk can be obtained from Kk−1 by
selecting certain elements in all classes
(k−1) (k−1) (k−1)
C1 , C2 , . . . , Ck−1
(k)
of Kk−1 and including them in the new class Ck . Thus, every previous classi-
fication can be regarded as a simplified version of the next one, and every next
classification is a refinement of the preceding one.
The main theorem of this paper establishes that the classifiers CL(V, E, , r) are
capable of solving all classification tasks for any given dataset of DNA sequences.
Theorem 1. For each set S of DNA sequences and every given partition
S = S1 ∪S
˙ 2 ∪˙ · · · ∪S
˙ k (6)
C = (V, E, , r) (7)
K : X ∗ = C1 ∪C
˙ 2 ∪˙ · · · ∪C
˙ k (8)
such that the classes of partition (6) are determined by the classes of classification
(8) so that Si = S ∩ Ci for all i = 1, . . . , k.
A New Model for Classifying DNA Code 191
Proof. First, let us define convenient notation which will enable us to refer to all
sequences and their base pairs. Putting N = |S|, denote the sequences of the set
S by b(1) , b(2) , . . . , b(N ) . For each i = 1, . . . , N , denote the bases of the sequence
(i)
b(i) by the symbols bj , where j = 1, . . . , mi so that
(i) (i)
b(i) = b1 , b2 , . . . , b(i)
mi ,
for all i = 1, . . . , N . Suppose that the sequence b(i) belongs to the class Sφ(i) of
partition (6), where φ is a function from [1 : N ] into [1 : k].
Next, we introduce the following sets of vertices for the classifier CL(V, E, , r)
we are going to construct:
V0 = {v1 , v2 , . . . , vk } (9)
(i) (i) (i)
Vi = {v1 , v2 , . . . , vmi −1 } (10)
for i = 1, 2, . . . , N . In addition, choose a vertex v0 which does not belong to any
of the sets V0 , V1 , . . . , VN and suppose that these sets are pairwise disjoint and
all of their vertices are distinct. Put
V = V0 ∪ V1 ∪ · · · ∪ VN ∪ {v0 }. (11)
To simplify further notation and have uniform definitions, we are going to denote
(1) (2) (N )
one and the same vertex v0 by several alternative symbols v0 , v0 , . . . , v0 too.
(i)
Similarly, for i = 1, 2, . . . , N , we introduce a new symbol vmi to be used as an
alternative notation for the vertex vφ(i) ∈ V0 . For i = 1, 2, . . . , N , let us introduce
sets of edges
(i) (i) (i) (i) (i) (i)
Ei = {(v0 , v1 ) = (v0 , v1 ), (v1 , v2 ), . . . , (vmi −1 , vm
(i)
i
(i)
) = (vm i
, vφ(i) )} (12)
and put
E = E1 ∪ E2 ∪ . . . ∪ EN . (13)
It remains to define the initial state s0 and the labels and r, see (2) and (3).
For all i = 1, 2, . . . , N and j = 1, 2, . . . , mi , put
(i) (i) (i)
((vj−1 , vj )) = bj , (14)
(i) (i)
r((vj−1 , vj )) = 1. (15)
The initial state s0 is defined by putting, for v ∈ V ,
1 if v = v0 ,
s0 (v) = (16)
0 otherwise.
This completes the definition of the classifier CL(V, E, , r).
Suppose that the classifier is used to process the sequence b(i) , where 1 ≤ i ≤ N .
We are going to show by induction that after considering the first j bases of the
sequence the current state of the classifier will satisfy
(i)
1 if v = vj ,
sj (v) = (17)
0 otherwise,
for any v ∈ V .
192 B. Kang et al.
The induction basis is provided by (16). Suppose that equalities (17) have
been established for some 1 < j < mi . Then we can find the next state sj+1 (v)
using recursion (5).
(i)
First, consider the case where v = vj+1 . Since E contains only one edge of
(i) (i)
the form (w, vj+1 ), and sj (vj ) = 1 by the induction assumption, (15) and (5)
yield us that
sj+1 (v) = r((w, v))sj (w) (18)
w∈V,(w,v)∈E
(i)
= r((w, vj+1 ))sj (w) (19)
(i)
w∈V,(w,vj+1 )∈E
(i) (i) (i)
= r((vj , vj+1 ))sj (vj ) (20)
(i)
= sj (vj ) (21)
= 1. (22)
(i)
Thus, for v = vj+1 , the required version of (17) holds indeed.
(i) (i)
Second, assume that v
= vj+1 . Consider any w ∈ V . If w = vj , then (w, v) ∈
/
(i)
E by the choice of v. If, however, w = vj ,
then sj (w) = 0 by the induction
assumption. Thus all summands in recursion (5) vanish and we get
sj+1 (v) = r((w, v))sj (w) (23)
w∈V,(w,v)∈E
= 0. (24)
(i)
This means that the desired version of (17) holds if v = vj+1 , too. By the
principle of mathematical induction, it follows that (17) is always satisfied.
(i) (i)
After all bases b1 , . . . , bmi of the sequence b(i) have been processed, the final
state of the the classifier CL(V, E, , r) turns into
(i)
1 if v = vmi = vφ(i) ,
smi (v) = (25)
0 otherwise.
According to our definition b(i) belongs to the class Cφ(i) of the classification
Kk , as required. This means that our classifier indeed produces a classification
that agrees with the given partition of data, and so the proof is complete.
the network takes a weighted sum of its inputs and passes it through a thresh-
old function, usually the sigmoid function. As indicated above, the classifiers
CL(V, E, , r) are different from neural networks and finite state automata.
The major difference is that neural networks and classifiers CL(V, E, , r)
are designed to solve substantially different types of problems. Neural networks
cannot be directly applied to classification of DNA sequences without collections
of some additional data, for example, from microarrays. The reason for this is
that the operation of every neural network depends on a relatively small number
of input parameters, represented as continuous real values. Small changes to
the values of these parameters are not generally supposed to create changes to
the classification outcome. Hence it is impossible to encode whole long DNA
sequences in this way. In contrast, the classifiers CL(V, E, , r) can process all
base pairs of a given DNA sequence in succession.
Sophisticated continuous threshold functions used in neural networks lead to
another serious difference (see [25], Section 11). Although the current state of
a classifier CL(V, E, , r) appears similar to the state of a neural network, the
transition to the next state is accomplished in a completely different fashion.
Comparing the classifiers CL(V, E, , r) to finite state automata, let us just
note that each finite state automaton is used to divide its input into two classes
only. Besides, the edges of finite state automata do not have real numbers as
labels. These labels are inspired by analogy with neural networks. They make
classifiers CL(V, E, , r) more flexible than finite state automata. This is why it
is natural to expect that future research will demonstrate the possibility of sub-
stantial reduction to the size of the classifiers CL(V, E, , r) designed to handle
certain classification tasks.
6 Main Algorithm
After a classifier CL(V, E, , r) has been found, the next natural step is to make
it smaller. This can be achieved by identifying equivalent vertices. We say that a
classifier CL(V, E, , r) is minimal if it can no longer be simplified by combining
and identifying its vertices in some groups. As a guide to developing our mini-
mization algorithm we are going to use the established standard terminology for
analogous situations known in automata theory. Our new algorithm originates
from the reduction algorithm for finite state automata described in several books
(see, for example, [13], Section 3.7).
The minimization algorithm we are going to develop applies only to classifiers
of the special type used in the proof of our main theorem. Namely, here we re-
strict our attention to the classifiers where each current state is a characteristic
function of one of the vertices: it is equal to 1 at this vertex, and is equal to 0
at all other vertices. The special vertex will be called the vertex of the current
state.
The algorithm proceeds by identifying equivalent vertices, so that one can
combine them without affecting the action of the classifier CL(V, E, , r) on
input strings.
194 B. Kang et al.
V × V = {(u, v) | u, v ∈ V } (26)
The set v is called the equivalence class of containing v. It is known and easy
to verify that is an equivalence relation if and only if the sets v , v ∈ V , form
a partition of V into several equivalence classes.
Let be an equivalence relation on C. Next, we show how simplifies C
by combining all vertices which belong to the same classes of . The resulting
classifier will be called a quotient classifier. Namely, the quotient classifier C/
is the quadruple
C/ = (V /, E/, /, r/), (29)
where the sets V /, E/ and functions /, r/ are defined as follows. The set
V / is the set of all equivalence classes of on V . The set E/ will contains an
edge (u , v ) with
(/)((u , v )) = x ∈ X (30)
if and only if there exist u ∈ u and v ∈ v such that (u , v ) ∈ E and
((u , v )) = x. In this case we set
(r/)((u , v )) = r((u , v )). (31)
u ∈u ,v ∈v ,(u ,v )∈E,((u ,v ))=x
To simplify notation, we will use the same symbols and r for the functions /
and r/, too.
We say that two vertices of a the classifier CL(V, E, , r) are *-equivalent if
the result of classification of each word by the classifier CL(V, E, , r) starting in
A New Model for Classifying DNA Code 195
the state of one of vertices coincides with its classification result when it starts
from the state of the second vertex.
In order to determine whether two vertices are *-equivalent, the algorithm
uses an iterative process based on k-equivalence. Two states are said to be
k-equivalent if every word of length ≤ k produces identical classification out-
comes in the case where the classifier CL(V, E, , r) starts in the state of the
first vertex, exactly as when it starts in the second vertex. It is straightforward
to verify that *-equivalence is a congruence.
In order to start the process, let us say that two vertices s and t of the the
classifier CL(V, E, , r) C = (V, E, , r) are 0-equivalent to each other if and
only if they coincide. Next, suppose that for some k ≥ 0 the k-equivalence has
already been defined. Taking any two vertices s and t in V , we say that s is
(k + 1)-equivalent to t if and only if s and t are k-equivalent and, for each input
letter x ∈ X, if the classifier starts in the state of the vertex s and processes the
letter x, then it arrives at exactly the same state that is achieved if it starts in
the state of the vertex t and processes the letter x from that state, so that there
is no difference between starting from s or from t.
The method of computing the k-equivalence classes from (k − 1)-equivalence
classes is a dynamic programming algorithm. It finds the k-equivalence classes
by subdividing the (k − 1)-equivalence classes according to the change of state
of the classifier CL(V, E, , r) when it reads each of the letters in X.
Since the set of all vertices is finite, they cannot be combined indefinitely,
and at some stage the algorithm terminates. For some integer k ≥ 0, the set of
k-equivalence classes will coincide with the set of (k + 1)-equivalence classes. At
this stage we see that both k-equivalence and (k + 1)-equivalence are in fact the
∗-equivalence.
These explanations show that the following steps find a minimal classifier
CL(V, E, , r) equivalent to the original one:
7 Open Questions
Problem 2. Evaluate the running time and develop more efficient minimization
algorithms for these classifiers.
196 B. Kang et al.
Two other related models used in the analysis of DNA sequences are Markov
Models and probabilistic finite state automata, see Baldi and Brunak [1], Durbin,
Eddy, Krogh and Mitchison [5], Jones and Pevzner [10] and Mount [26]. They
have been used to identify and classify segments of one DNA sequence and are
different from our model. It may make sense to explore the possibility of using
these notions to classify sets of whole large DNA sequences too. This leads to
the following questions suggested by the referees of this paper.
Problem 4. Investigate the running times and compare the classifications pro-
duced by our new model with those which can be obtained using Markov Models.
Problem 5. Investigate the running times and compare the classifications pro-
duced by our new model with those which can be obtained using probabilistic
finite state automata.
Acknowledgements
This research has been supported by the IRGS grant K14313 of the Univer-
sity of Tasmania and Discovery grant DP0449469 from the Australian Research
Council.
The authors are grateful to the referees for suggesting interesting open ques-
tions recorded in Probems 4 and 5.
References
1. Baldi, P. and Brunak, S.: “Bioinformatics : The Machine Learning Approach”.
Cambridge, Mass, MIT Press, (2001).
2. Cormen, T.H., Leiserson, C.E., Rivest, R.L. and Stein, C.: “Introduction to Algo-
rithms”, The MIT Press, Cambridge, 2001.
3. Dazeley, R.P., Kang, B.H.: Weighted MCRDR: deriving information about rela-
tionships between classifications in MCRDR, AI 2003: Advances in Artificial Intel-
ligence, Perth, Australia, 2003, 245–255.
4. Dazeley, R.P., Kang, B.H.: An online classification and prediction hybrid system
for knowledge discovery in databases, Proc. AISAT 2004, The 2nd Internat. Conf.
Artificial Intelligence in Science and Technology, Hobart, Tasmania, 2004, 114–119.
5. Durbin, R., Eddy, S.R., Krogh, A. and Mitchison, G.: “Biological Sequence Anal-
ysis”. Cambridge University Press (1999).
6. Eilenberg, S.: “Automata, Languages, and Machines”. Vol. A,B, Academic Press,
New York, 1974.
7. Gallian, J.A.: Graph labeling, Electronic J. Combinatorics, Dynamic Survey DS6,
January 20, 2005, 148pp, www.combinatorics.org
8. Gusfield, D.: “Algorithms on Strings, Trees, and Sequences”, Computer Science
and Computational Biology, Cambridge University Press, Cambridge, 1997.
A New Model for Classifying DNA Code 197
9. Holub, J., Iliopoulos, C.S., Melichar, B. Mouchard, L.: Distributed string matching
using finite automata, “Combinatorial Algorithms”. AWOCA 99, Perth, 114–127.
10. Jones, N.C. and Pevzner, P.A.: An Introduction to Bioinformatics Algorithms.
Cambridge, Mass, MIT Press, (2004). http://www.bioalgorithms.info/
11. Kang, B.H.: “Pacific Knowledge Acquisition Workshop”. Auckland, New Zealand,
2004.
12. Kelarev, A.V.: “Ring Constructions and Applications”. World Scientific, 2002.
13. Kelarev, A.V.: “Graph Algebras and Automata”. Marcel Dekker, 2003.
14. Kelarev, A.V., Miller, M. and Sokratova, O.V.: Directed graphs and closure proper-
ties for languages. “Proc.12 Australasian Workshop on Combinatorial Algorithms”
(Ed. E.T. Baskoro), Putri Gunung Hotel, Lembang, Bandung, Indonesia, July 14–
17, 2001, 118–125.
15. Kelarev, A.V., Miller, M. and Sokratova, O.V.: Languages recognized by two-sided
automata of graphs. Proc. Estonian Akademy of Science 54 (2005) (1), 46–54.
16. Kelarev, A.V. and Sokratova, O.V.: Languages recognized by a class of finite au-
tomata. Acta Cybernetica 15 (2001), 45–52.
17. Kelarev, A.V. and Sokratova, O.V.: Directed graphs and syntactic algebras of tree
languages. J. Automata, Languages & Combinatorics 6 (2001)(3), 305–311.
18. Kelarev, A.V. and Sokratova, O.V.: Two algorithms for languages recognized by
graph algebras. Internat. J. Computer Math. 79 (2002)(12) 1317–1327.
19. Kelarev, A.V. and Sokratova, O.V.: On congruences of automata defined by di-
rected graphs. Theoret. Computer Science 301 (2003), 31–43.
20. Kelarev, A.V. and Trotter, P.G.: A combinatorial property of automata, languages
and their syntactic monoids. Proceedings of the Internat. Conf. Words, Languages
and Combinatorics III, Kyoto, Japan, 2003, 228–239.
21. Lee, K.H., Kay, J., Kang, B.H.: Keyword association network: a statistical multi-
term indexing approach for document categorization. Proc. Fifth Australasian Doc-
ument Computing Symposium, Brisbane, Australia, (2000) 9 - 16.
22. Lee, K., Kay, J., Kang, B.H.: KAN and RinSCut: lazy linear classifier and rank-in-
score threshold in similarity-based text categorization. Proc. ICML-2002 Workshop
on Text Learning, University of New South Wales, Sydney, Australia , 36-43 (2002)
23. Lee, K.H., Kay, J., Kang, B.H., Rosebrock, U.: A comparative study on statis-
tical machine learning algorithms and thresholding strategies for automatic text
categorization. Proc. PRICAI 2002, Tokyo, Japan, (2002) 444–453.
24. Lee, K.H., Kang, B.H.: A new framework for uncertainty sampling: exploiting un-
certain and positive-certain examples in similarity-based text classification. Proc.
Internat. Conf. on Information Technology: Coding and Computing (ITCC2004),
Las Vegas, Nevada, 2004, 12pp.
25. Luger, G.F, “Artificial Intelligence. Structures and Strategies for Complex Problem
Solving”. Addison-Wesley, 2005.
26. Mount, D.: “Bioinformatics: Sequence and Genome Analysis”. Cold Spring Harbor
Laboratory, (2001). http://www.bioinformaticsonline.org/
27. Park, S.S., Kim, Y., Park, G., Kang, B.H., Compton, P.: Automated information
mediator for HTML and XML Based Web information delivery service. Proc. 18th
Australian Joint Conf. on Artificial Intelligence , Sydney, 2005, 401–404.
28. Park, G.S., Kim, Y.S., Kang, B.H.: Synamic mobile content adaptation according
to various delivery contexts. J. Security Engineering 2 (2005) 202-208.
29. Park, G.S., Kim, Y.T., Kim, Y., Kang, B.H.: SOAP message processing perfor-
mance enhancement by simplifying system architecture. J. Security Engineering 2
(2005) 163–170.
198 B. Kang et al.
30. Park, G.S., Park, S., Kim, Y., Kang, B.H.: Intelligent web document classification
using incrementally changing training data Set, J. Security Engineering 2 (2005)
186–191.
31. Păun, G. and Salomaa, A.: “New Trends in Formal Languages”. Springer-Verlag,
Berlin, 1997.
32. Petrovskiy, M.: Probability estimation in error correcting output coding framework
using game theory. AI 2005: Advances in Artificial Intelligence, Sydney, Australia,
2005, Lect. Notes Artificial Intelligence 3809 (2005) 186–196.
33. Pin, J.E.: “Formal Properties of Finite Automata and Applications”. Lect. Notes
Computer Science 386, Springer, New York, 1989.
34. Rozenberg, G. and Salomaa, A.: “Handbook of Formal Languages”. Vol. 1, Word,
Language, Grammar, Springer-Verlag, Berlin, 1997.
35. Smyth, B.: “Computing Patterns in Strings”. Addison-Wesley, 2003.
36. Sugeng, K.A., Miller, M., Slamin and Bača, M.: (a, d)-edge-antimagic total label-
ings of caterpillars. Lecture Notes in Comput. Sci. 3330 (2005) 169–180.
37. Tuga, M. and Miller, M.: Δ-optimum exclusive sum labeling of certain graphs with
radius one. Lecture Notes in Comput. Sci. 3330 (2005) 216–225.
38. van Leeuwen, J.: “Handbook of Theoretical Computer Science”. Vol. A,B, Algo-
rithms and Complexity. Elsevier, Amsterdam, 1990.
39. Witten, I.H. and Frank, E.: “Data Mining: Practical Machine Learning Tools and
Techniques with Java Implementations”. Morgan Kaufmann, 2005.
Improvements on Common Vector Approach
Using k-Clustering Method
1 Introduction
Voice signal contains psychological and physiological properties of speakers as
well as dialect differences, acoustical environment effects, and phase differences
[1]. Because of these reasons, even the same voice signal shows different charac-
teristics when the sound comes from a different speaker. These characteristics
of voice signal make it difficult to extract the common properties from the voice
class(word or phoneme). Most of the speech recognition methods recognize the
voice with the following process; extract the common properties of the word,
make comparative patterns of the observed voice, compare both of them. There-
fore, the efficiency of recognition method is totally depends on both extraction
of the common property and decision of words using comparison. In this paper,
we proposed advanced CVA method. The algorithm of CVA is easy to extract
the common properties from training voice signals and also does not need com-
plex calculation [1-4]. In addition, CVA has shown high accuracy in recognition
results. CVA, however, has a problem for applying when many training voices
are given [4]. Generally, to get the optimal common vectors from one of voice
classes, various voices should be used for training. However it is impossible to get
continuous high accuracy in recognition with CVA because CVA has a limitation
to use many training voices. To solve the problem and improve recognition rate,
k-clustering method is used. Various experiments were performed using voice
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 199–206, 2006.
c Springer-Verlag Berlin Heidelberg 2006
200 S. Jin, M. Nam, and S.-T. Han
signal database made by ETRI to prove the validity of proposed method. The
result of experiments shows improvements in performance. The problem of CVA
can be solved without calculation problem using the proposed method.
method consists of three parts greatly. First part is the k-clustering which clus-
ters training voice signals by small subspace of k, next part is the common vector
extraction which extracts common vector from small subspaces, and the last part
derives the total common vector from extracted common vectors of subspaces.
We explain whole algorithm of the k-clustering method and verify a detailed
numerical expressions. Let there be given n-dimensional linearly independent
vectors a1 , a2 , . . . , am ∈ Rn for m n. Here each ai (i = 1, 2, . . . , m denotes the
speaker number) belongs to the class of one of the spoken words. First, cluster
vector ai into small subspace of k, (where k < m and k ≤ n − 1). Next, extract
the common features using conventional CVA from small subspace of k. The
orthonormal vector set of the subgroup is used for extraction of each common
vector. By the last process, extract the total common vector from common vec-
tors of k subgroups again. Extracted total common vector means the common
component of vector ai , and we can get 100% recognition rate for the training
word set.
There exist various methods that cluster vector ai into small subspace of k. In
this paper, we consider two methods of clustering. The first method clusters
vectors at random, and the other method uses k-means algorithm [5-6]. From
the results of experiment, the case of clustering vectors at random and using
k-means algorithm could not get the invariable recognition rate. Along with
how to cluster vector ai , each small subspace is changed, and it effects to the
recognition rate. Clustering methods do not make small subspace as the same
form in always. In k-means algorithm, elements of small subspace are changed
according to how to establish elementary value. Therefore, if we form small
subspace using above methods, we might get the good recognition rate but the
bad recognition rate as well. It can be a big problem when we apply them to
practical recognizer. In this paper we used the best clustering result that was
gotten from several independent trials. The used method of clustering training
vectors by using k-means is as following.
The method that extracts common vector from each small subspace is equal
with the conventional CVA where the reference vector r should be used. The
CVA in jth small subspace can be written as following. Suppose aji (j ≤ k, i =
1, 2, . . . , t < n − 1) is a vector of jth small subspace. From Eq. (5), the common
vector of jth small subspace can be written as Eq. (6).
b1 = aj1 − r
b2 = aj2 − r
.. (5)
.
bt = ajt − r
ãji = aji , z1j z1j + aji , z2j z2j + · · · + aji , ztj ztj (7)
The common vector ãjcommon extracted from each small subspace can be written
Eq. (8) and it is linearly independent.
ã1common = ãtotal
common + ãcommon,dif f
1
.. (8)
.
ãkcommon = ãtotal
common + ãcommon,dif f
k
j
From Eq. (8), ãtotal
common means the total common vector and ãcommon,dif f repre-
sents individual features of the common vector extracted from jth small subspace.
Therefore, we can get the common component of total training vectors by ex-
tracting the common vector from first common vectors using CVA once more.
The derived total common vector becomes a reference vector that is used to
decide a word in the decision part. If the input vector enters, firstly, project the
input vector into the each small subspace and derive the difference between the
input vector and the projected vector. Secondly, calculate the each likelihood of
the common vector of small subspaces. Lastly, compare likelihood with the total
common vector by projecting the common vector of small subspace that has
the maximum likelihood into the subspace of the total common vector again.
Using Fig.1 process, we decide the recognition word that has the most likeli-
hood finally. The proposed method has the similar calculation amount to the
conventional method and it solves the problem of the limitation of the number
of training vector.
Improvements on Common Vector Approach Using k-Clustering Method 203
F e a t u re A n a ly s is
2 d im e n s io n D C T
In p u t P a rt
F ix e d S iz e P a ra m e t e r
I n v e rs e 2 d im e n s io n D C T
R e c o g n itio n P a rt
(1 s t w o rd c la s s ) ( 2 n d w o rd c la s s ) ( g t h w o rd c la s s )
P ro j. t o 1 s t s u b s p a c e
P ro j . t o 2 n d s u b s p a c e
P ro j . t o 1 s t s u b s p a c e
P ro j. t o 2 n d s u b s p a c e L P ro j. t o 1 s t s u b s p a c e
P ro j. to 2 n d s u b s p a c e
. . .
P ro j . t o k 1 t h s u b s p a c e P ro j . t o k 2 t h s u b s p a c e P ro j . t o k g t h s u b s p a c e
P ro j . t o T o t a l c o m m o n
v e c to r s u b s p a c e s
P ro j . t o T o t a l c o m m o n
v e c to r s u b s p a c e s L P ro j . t o T o t a l c o m m o n
v e c to r s u b s p a c e s
M a x im u m lik e lih o o d
s e le c t io n D e c is io n P a rt
The ETRI isolated word database, which consists of the 20 digits (0-9) and 22
words, was used to evaluate the performance of the proposed algorithm. In the
20 digits, 20 male and 20 female speakers each recorded 4 repetitions of each
word, for a total of 3200 utterances. In the 22 words, 48 male and 43 of female of
speakers each recorded 1 repetition of each word, for a total of 2002 utterances.
204 S. Jin, M. Nam, and S.-T. Han
I s o la te d 100
d ig i t 98
96
R e c o g n it i o n R a t e s ( % ) 94
92
90
A u d ito r y F ilte r
C e p s tr u m
88
LP C
LS P
86 M FCC
84
C o n v e n tio n a l
82
c o m m o n v e c to r
80
0 1 2 3 4 5 6 7 8 9 10
k -c lu s te r in g N u m b e r
I s o la te d 98
w o rd
96
94
R e c o g n itio n R a t e s ( % )
92
90
88
86 C o n v e n tio n a l A u d it o r y F ilte r
c o m m o n v e c to r C e p s tr u m
LS P
84 M FCC
LP C
82
80
0 1 2 3 4 5 6 7 8 9 10
k - c lu s te r in g N u m b e r
Fig. 2. Recognition rates of variations of the k-clustering number for isolated digit and
isolated word
The input speech is sampled at 8 kHz rate and stored by 16 bits. The feature is
extracted every 15 msec frame overlapped by 50%. To confirm effectiveness of
proposed method, we made an experiment using auditory model (32nd order),
Improvements on Common Vector Approach Using k-Clustering Method 205
Cepstrum (32nd order), LPC (12th order), LSP (32nd order) and MFCC (32nd
order) [7-9]. The extracted parameter is normalized using two-dimensional DCT
in fixed size and used input of CVA and k-clustering method [10]. From the
results, we find that the change of the number of small subspace do not cause
big effect to recognition result, but the recognition rate slightly decreased along
with increasing the number of clusters.
Conventional
Method k=2 k=4 k=6 k=8
CVA(k=1)
MFCC 98.12 97.50 96.50 96.00 95.00
Auditory Model 98.12 98.50 98.50 98.25 97.25
Cepstrum 94.43 95.25 95.25 93.00 93.25
Conventional
Method k=2 k=4 k=6 k=8
CVA(k=1)
MFCC 94.91 94.47 94.11 94.11 93.49
Auditory Model 94.38 95.45 94.38 94.20 94.74
Cepstrum 94.38 95.98 94.83 93.93 93.49
5 Conclusion
The algorithm of CVA is easy to extract the common properties from train-
ing voice signals and does not need complex calculation. The CVA has shown
high accuracy in the recognition results. However the CVA has a drawback of
being impossible to use for many training voices. In this paper, we proposed
the k-clustering method which improved the CVA, and experimented Korean
speaker independent isolated word recognition. The k-clustering method solved
the drawback of CVA and got better recognition rate of 1.39% without signifi-
cant changes of amounts of computation. Proposed method, however, has various
recognition rate according to the number of clusters. Therefore, determination
of the optimal number of clusters will be critical for applying k-clustering CVA.
In this study the number of clusters are explored heuristically but several crite-
ria can be applied and compared for finding optimal number of clusters. If the
optimal number of cluster problem is solved the algorithm of k-clustering CVA
will be simpler to implement. There will be further research for developing an
algorithm to find isolated word recognition with various clustering algorithms
such as fuzzy c-means clustering.
206 S. Jin, M. Nam, and S.-T. Han
References
1. Bilginer, M. et al.: A novel approach to isolated word recognition, Speech and
Audio Processing, IEEE Trans. 7 (1999) 620-628
2. Cevikalp, Hakan, Wilkes, Mitch: Discriminative common vectors for face recogni-
tion, IEEE Trans. Pattern analysis and machine intelligence 27, no.1, Jan. (2005)
3. Gulmezoglu, M. B., Dzhafarov, V. and Barkana, A.: The common vector approach
and its relation to the principal component analysis, IEEE Trans. Speech and
Audio Processing 9 (2001) 655-662
4. Gulmezoglu, M. B., Dzhafarov, V. and Barkana, A.: Comparison of common vec-
tor approach and other subspace methods in case of sufficient data, in Proc. 8th
conference on Signal Processing and Applications, Belek, Turkey (2000) 13-18
5. Duda, Richard O., Hart, Peter E., Stork, David G.: Pattern Classification, Wiley
Interscience (2001)
6. Cho, C.H.: Modified k-means algorithm, The Journal of Acoustical Society of Korea
19 no.7 (2000) 23-27.
7. Wallace, G. K.: The JPEG still picture compression standard, Consumer Electron.,
IEEE Trans. 38 (1992) 18-34.
8. Lay, David C.: Linear algebra and its applications, Addison Wesley (2000)
9. Deller, John R., Proakis, Jhon G., Hansen, John H. L.: Discrete-time processing
of speech signals, Macmillan Publishing Company. (1993)
10. Nam, M.W., Park, K.H., Jeong, S.G., Rho, S.Y.: Fast algorithm for recognition of
Korean isolated word, The Journal of Acoustical Society of Korea 20, no.1 (2001)
50-55
The Method for the Unknown Word Classification
Abstract. Natural Language Processing is a hard task. For the real Natural
Language Processing, it is a necessary technique to process the unknown
words. In this paper, we introduce the method for understanding the unknown
words means. Many terms are newly created and we do not find these words
in dictionary. Unknown words are generally occurred by reflecting the new
phenomenon and technology. Hence, unknown words are dramatically cre-
ated because of rapid changes in society. However, it is a hard task to define
the meaning of all unknown words in dictionary. So, in this paper, we focus
on how the machine understands the unknown words means. We propose a
method to classify unknown words using the relevancy values between all
nouns in the document and their TF values.
1 Introduction
For the real Natural Language Processing(NLP), there are several tasks that we have to
solve. One of the significant tasks is to classify the unknown words. The NLP system
encounters words that are not in its lexicon frequently. In here, we define the terms as
the unknown words which do not exist in the lexicon. As a NLP system will perform
well, it should understand the unknown words. Even when the unknown words are not
occurred very often, they have an effect on a NLP system quality. For the past several
years, the importance of the task for Natural Language Processing systems has been
recognized. However, it is a very hard task because the huge unknown words can be
created everyday and they cannot be completely registered in the lexicon by the human.
Therefore, robust approaches for processing unknown words are needed. [1][2][3] The
method presented in this paper allows the automatic detection of the unknown words
means. A machine applying our technique can understand the unknown words and it
can classify the words automatically instead of the human knowledge engineer. In our
approach, several algorithms such as unknown words detection, unknown words under-
standing and classification are demanded. Our proposed method is using the Relevancy
values among the terms based on WordNet and their TF values.
This paper organized as follows: Section 2 introduces the background technolo-
gies and we present our proposed method for understanding the unknown words
*
Corresponding author.
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 207 – 215, 2006.
© Springer-Verlag Berlin Heidelberg 2006
208 H. Kong, M. Hwang, and P. Kim
In this section we describe our approach how to understand the unknown words
means. Under this view, unknown word processing is based on the existing
techniques.
a) TF values
b) Relevancy values in WordNet
c) Noun detection
The brief explanation about each technique is as follows.
In this paper, we use WordNet to detect the nouns in the sentences and to get the
relevancy values among these nouns. WordNet is a semantic lexicon for the English
language. It groups English words into sets of synonyms called synsets and records the
various semantic relations between these synonym sets. And English nouns, verbs,
adjectives and adverbs are organized into synonym sets in WordNet.[4] Most synsets
are connected to other synsets via semantic relations. In our approach, we consider the
nouns and two semantic relations (Hypernym/Hyponym, Holonym/Meronym) in Word-
Net for measuring the relevancy values. Second basic technique in our approach is the
TF value. Documents are written using huge terms. In this case, it is very hard to deter-
mine which words are most important in documents. Until now, for determining the
importance of words in document, we generally measure the TF(term frequency).[5]
And the formula calculating TF is as follows:
n
tf = i
∑
k
n k
In above formula, ∑n
k
k means the total of all terms, which the document con-
tains, ni is the number of occurrence of the specific term. And values gained
through TF formula are the indispensable data with the relations between terms in
our approach.
Processing unknown words is a mixture of two above techniques. In order to
process document containing unknown words, we use all the nouns contained in
the document. In our approach, unknown words are processed following the steps.
Ci : Set of nouns ;
WN : Nouns in WordNet ;
⌐∈
If (Ci WN)
{
Unknown_Word = Ci ;
}
Unknown_Word : The results after processing STEP 1
STEP 2. Extracting the sentences containing unknown word and all nouns except
unknown word in document
In this step, we extract the sentences containing unknown words in document. And
then, we detect the all nouns in document except unknown word. In our method, we
assume that all nouns in document are related to the unknown word. Moreover, we
think the nouns in sentences, which contain unknown word, are closely related to
unknown word. Hence, we divide the document into two parts:
- Sentences Part: All sentences containing the unknown word
- Nouns Part: All nouns in document except the unknown word
The processing flow of sentences extraction is as follows:
Table 2. The pseudo code for extracting the sentences containing unknown word
Using pseudo code in Table 4, TF values about all nouns in table 3, for example,
are calculated as shown in Table 5.
The Method for the Unknown Word Classification 211
Table 5 shows the results of the TF values of all nouns in Nouns part. Until now, in
previous approaches to understand the unknown words, the TF values play the core
role. However, it does not support for the complete understanding of unknown words.
Hence, we consider the Relevancy value between nouns as well as TF values for the
perfect unknown words understanding.
(2) Measure the Relevancy values between noun in Sentence part and noun in Nouns
part based on pseudo code in Table 6.
After detecting the nouns, we measure the Relevancy value, which means how the
nouns are related to the unknown word. In here, we use the semantic relations be-
tween nouns defined in WordNet. As we mentioned before, there are many relations
in WordNet. However, we just use two relations (Hypernym/Hyponym, Holo-
nym/Meronym) for measuring the Relevancy value among nouns.
Table 7. Measured Relevancy values among nouns about sample document as shown in table 3
Relevancy
Synset_ID Related Synset_ID and relationship based on WordNet Values
~ 10720570 ~ 09729204 %p 04410590 %p 04254824 ~ 04008331
04207742 %p 03400842 %p 03389509 %p 03228252 ~ 03006338
9
~ 10066029 ~ 09933701 ~ 09908263 ~ 09843239 ~ 09843239 ~ 09819657
~ 09721227 ~ 09624379 ~ 09608190 ~ 09598437 ~ 09463859 ~ 09385835
02853224 ~ 09285577 ~ 09223355 ~ 09145707 ~ 09144663 ~ 09019701 ~ 09013278
22
~ 09012224 %p 04916889 #m 07463651 @ 00003226
~ 08134364 ~ 08124727 ~ 08120943 ~ 08087842 ~ 08066770 ~ 08063710
08103697 ~ 08034339 ~ 07992043 ~ 07982095
9
00017572 ~ 14254673 ~ 14130483 ~ 14051444 ~ 14051242 ~ 14050897 ~ 14027638 6
212 H. Kong, M. Hwang, and P. Kim
Table 7 shows the measured Relevancy values among nouns based on the relations
defined in WordNet.
(3) Through the STEP 4, we can determine the meaning of the unknown word. Using
the results in Table 9, the unknown word(S-Class) could be classified into the car.
Table 9. CV Results
Nouns Results
(Synset_ID) CV Concepts
02853224 798.0001 Car, auto, automobile, machine, motorcar
05394410 352.7999 Arrangement, organization, organization, system
05396456 338.8001 Design, plan
04008331 313.6001 Sedan
07924048 293.9999 System, scheme
Table 9 shows the CV results about each noun. In table 9, noun ‘car’ has the high-
est CV values. So, we could conclude that ‘S-Class’ is very close to the car. Hence,
efficiency of our approach was certified correct.
In example of Section 2, we are sure that our approach is suitable for defining the
meaning of unknown words. We have evaluated our proposed method formatively. In
The Method for the Unknown Word Classification 213
order to examine the validity of the method we adopted, the approach was evaluated
with some amounts of document resources. Our testing environment is as follows:
(1) Objective: To measure efficiency of our approach.
(2) Materials: WordNet as knowledge source, each 5 documents containing five
unknown words – TGV(train), Fanta(beverage), Zindane(soccer player),
S-Class(car) and Marlboro(cigarette).
(3) Results: Table 10 shows the results of CV values of the five documents respectively.
TGV
Document 1 CV Document 2 CV Document 3 CV Document 4 CV Document 5 CV
train 8 France 7 France 4 road 24 train 2
Korea 8 travel 6 train 2 train 18 station 2
Pusan 8 pass 4 transportation 2 way 15 Paris 2
route 7 rail 4 rail 2 track 12 Sur 2
South 7 train 4 passenger 2 car 8 rail 2
Fanta
can 6 Thailand 81 food 48 orange 8 orange 144
mango 5 mango 13 orange 9 company 4 lemon 12
orange 4 product 12 soda 8 war 4 can 6
diet 3 drink 11 product 8 coca 4 flavor 4
lemon 3 orange 11 flavor 7 Germany 2 taste 4
Zindane
ball 10 France 70 man 12 soccer 6 game 2
cup 5 Paris 30 world 11 world 6 football 2
player 5 world 11 player 7 cup 4 generation 2
name 4 cup 8 cup 7 France 4 trailer 2
French 4 Frenchman 7 game 6 player 3 player 2
S-Class
sedan 6 car 15 car 24 car 24 sedan 13
car 5 symbol 8 system 18 model 9 price 10
equipment 3 road 8 body 12 system 7 research 8
vehicle 3 coupe 6 time 12 seat 7 car 7
road 2 system 6 control 11 body 4 model 3
Marlboro
history 14 cigarette 8 cigarette 3 cigarette 35 television 30
advertising 12 brand 5 brand 3 search 9 Cancer 8
life 10 man 3 sales 2 image 7 death 8
marketing 6 Morris 3 world 2 brand 6 man 7
cigarette 5 Philip 3 country 2 man 5 cigarette 4
In the results of the testing, about the tangible unknown words such as TGV, Fanta,
S-Class and Marlboro, the effective classification is possible using our method. How-
ever, the results of the testing about the notional unknown words such as Zindane
have lower accuracy than the tangible words. Based on the testing results, we deter-
mine the meaning of each unknown word. Figure 1 shows the classification of the
unknown words means using the formula (1).
Unknown Words Means(= Highest Value(Ci)), Ci = ∑W(Ni) (1)
where, W is the weight value and N is the noun in the result. In formula (1), we give
the W value (5, 4, 3, 2, 1) respectively to the each noun in each document by sequence
of the CV values. And then, we measure the final result using formula (1). Finally, we
assume that the word, which has the highest value, is most related to the unknown
214 H. Kong, M. Hwang, and P. Kim
word. Hence, we classify the unknown words means into the word, which is selected
through formula (1).
Through the experimental results in Figure 1, we expect the efficient unknown
word classification using our approach. Especially, the results about Malboro(means
Cigarette), TGV(means Train) and S-Class(means Car) are very satisfied for support-
ing the performance of our approach. Hence, we certified that the robust unknown
word classification is possible using our approach.
Marlboro
brand cigarette
S-Class system
sedan car
Zindane cup
player world
Fanta
orange
mango
TGV France
train
0 5 10 15 20 25
4 Conclusion
In this paper, we try to understand the meaning of the unknown words. Nowadays the
unknown words are dramatically increasing and study on processing unknown words
has been not researched although this task is very important for real Natural Language
Processing. In our approach, we assume that nouns in document are related to the
unknown word. Therefore, we calculated the CV values using the TF values and Rele-
vancy values of the nouns. Based on the testing results, we assure that the CV values
strongly reflect the fact which noun is most related to the unknown word.
Acknowledgement
This research was supported by the Program for the Training of Graduate Students in
Regional Innovation which was conducted by the Ministry of Commerce Industry and
Energy of the Korean Government.
The Method for the Unknown Word Classification 215
References
1. H. Ishikawa, A. Ito and S. Makino, 1993. Unknown Word Processing Using Bunsetsu-
automaton, 2nd class of Technical report of IEICE, LK-92-17, pp.1-8 (in Japanese).
2. T. Kamioka and Y. Anzai, 1988. Syntactic Analysis of Sentences with unknown words by
Abduction Mechanism, Journal of Artificial Intelligence, Vol.3, pp.627-638 (in Japanese).
3. C. Kubomura, T. Sakurai and H. Kameda, 1996. Evaluation of Algorithms for Unknown
Word Acquisition, Technical report of IEICE, TL96-6, pp.21-30 (in Japanese).
4. Scott, S., Matwin, S.: Text Classification using WordNet Hypernyms. In the Proceeding of
Workshop – Usage of WordNet in Natural Language Processing Systems, Montreal, Can-
ada (1998).
5. Gelbukh, A., Sidorov, G., Guzman, A.: Use of a Weighted Topic Hierarchy for Document
Classification. In Václav Matoušek et al (eds.): Text, Speech and Dialogue in Poc. 2nd Inter-
national Workshop. Lecture Notes in Artificial Intelligence, No.92, ISBN 3-540- 66494-7,
Springer-Verlag., Czech Republic (1999) 130-135.
An Ontology Supported Approach to Learn
Term to Concept Mapping
Paris 5 University,
Paris, 75006,
France
[email protected], [email protected]
1 Introduction
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 216–222, 2006.
c Springer-Verlag Berlin Heidelberg 2006
An Ontology Supported Approach to Learn Term to Concept Mapping 217
the field ontology models verbs of the field as relations holding between the con-
cepts. If this is the case, labelling strategies are using the ontology and extracted
conceptual graphs to assign field specific terms to field concepts.
We shall approach that topic by answering a number of questions: which
method should be used to extract verb relations from corpus? How to learn
conceptual graphs from the extracted verb relations? Those questions are an-
alyzed in sections 2 and 3. Given a domain ontology and a set on conceptual
graphs, which strategies will be used to assign terms to concepts? The solution is
discussed in section 4. A first experimentation in the field of accidentology is de-
scribed and its results are presented in section 5. Related approach are presented
in section 6. Conclusions and perspectives of this work will end the paper.
Similarity measures implemented for this work are detailed further. Jaccard co-
efficient considers a string composed of several sub-strings and calculates the
similarity between two strings S and T as :
|S T |
Jaccard(S, T ) =
|S T |
This measure is given by the number of sub-strings common to S and T
compared to the number of all sub-strings of T and S. If we consider characters
as sub-strings, the coefficient expresses the similarity by taking into account the
number of common characters of S and T only.
Jaro and Jaro-Winkler coefficients, introduced below, express the similarity
by taking into account the number and the position of characters shared by
S and T . Let a = ai ..ak and b = b1 ..bl be two strings.A character ai ∈ s
is considered common to both strings if there is a bj ∈ t such as: ai = bj
and i − H ≤ j ≤ i + H, where H = min(|S|,|T 2
|)
. Let s1 = a11 ..a1k characters
1 1 1
of s common to t and t = b1 ..bl characters of t common to s. We define a
transposition between s1 and b1 as an index i such as: a1i = b1i . If Ts1 ,t1 is
1 1
the number of transpositions from s to t the Jaro coefficient calculates the
similarity between s and t as follows:
1 |s1 | |t1 | |s1 | − Ts1 ,t1
Jaro(s, t) = ( + + )
3 |s| |t| |s1 |
[5] proposes a version of this coefficient by using P , the length of the longer prefix
common to both strings. Let P 1 = max(P, 4), then Jaro-Winkler is written:
P1
Jaro − W inckler(s, t) = Jaro(s, t) + (1 − Jaro(s, t))
10
There are also hybrid approaches calculating similarities recursively, by analyz-
ing sub-strings of initial strings. Thus, Monge-Elkan uses two steps to calculate
similarity between s1 = a11 ..a1k and t1 = b11 ..b1l : the two strings are divided into
sub-strings then the similarity is given by:
k
1
sim(s, t) = maxL 1
j=1 (sim (aj , bj ))
k i=1
where values of sim1 (aj , bj ) are given by some similarity function, called basic
function, for example one of those previously presented. Such a function is called
a level 2 function. For this work, Monge-Elkan is implemented by using the
coefficients of Jaccard, Jaro and Jaro-Winkler as a basic function.
Statistic measures will be used in different phases of our approach.
The first step identifies verb classes, that represent the set of verb relations
generated by the same verb. For each verb class, instances of Verb and Verb,
Preposition patterns are added to the set of roots. We argue that for verbs
accepting prepositions, each verb, preposition structure is specific and for this
reason we create conceptual graphs for any of those structures. This step creates
a number of conceptual graphs having one level, which is to say the root.
For each root, its arguments are identified : terms that are subjects and ob-
jects.This step is adding a second layer to each conceptual graph.
As for a given verb, arguments can have different levels of granularity, a new
level can be added to conceptual graphs by clustering those arguments.
A cluster is a group of similar terms, having a central term C called centroid
and its k nearest neighbors. Based on the observation that the greater number
of words in a word regrouping there are, the more specific his meaning is, an
algorithm is proposed to cluster arguments of verb relations. This algorithm is
considering single word arguments as centroids, and it uses the Monge-Elkan
similarity coefficient to add terms to clusters.
A first experimentation of this approach was done in the field of road accidents. It
uses a corpus composed of 250 accident reports of road accidents which occurred
in and around the Lille region.
We used an ontology of road accidents created from accident reports by using
Terminae (see [6]). This ontology is expressed in OWL (see [7]). It models about
450 concepts describing road accidents and 300 roles connecting those concepts.
Arguments that are objects of the circuler avec (circulate with) relation are
labelled. The top-down strategy labels véhicule blanc as inconnu (unknown).
The bottom-up strategy allows us to eliminate the centroid feu (fire), which is
labelled as inconnu (unknown). On the downside, clusters having a small number
of terms are penalized.
For Jaro-Winkler coefficient, results of the three strategies are similar to re-
sults obtained with Jaro coefficient. For Jaccard coefficient, the bottom-up strat-
egy shows a failure as it assigns the term véhicule to the concept véhicule de
service. Independent of the coefficient that in used, the top-down strategy per-
forms faster.
For the same couple term, concept, values of the Jaccard coefficient are slightly
lower than values of Jaro and Jaro-Winkler. Therefore, values of thresholds for
Jaccard coefficient need to be lower than thresholds of Jaro and Jaro-Winkler
coefficients.
6 Related Work
References
1. Ceausu, V., Desprès, S.: Towards a text mining driven approach for terminology
construction. In: 7th International conference on Terminology and Knowledge
Engineering. (2005)
2. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Interna-
tional Conference on New Methods in Language Processing. (1994)
3. Roux, C., Prouxet, D., Rechenmann, F., Julliard, L.: An ontology enrichment
method for a pragmatic information extraction system gathering data on genetic
interactions. In: Ontology Learning Workshop at ECAI. (2000)
4. Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics
for name-matching tasks. In: IJCAI-2003,Workshop on Information Integration on
the Web pages. (2003)
5. Monge, A., Elkan, C.: The field-matching problem: algorithm and applications.
In: Second International Conference on Knowledge Discovery and Data Mining.
(1996)
6. Biébow, B., Szulman, S.: A linguistic-based tool for the building of a domain
ontology. In: International Conference on Knowledge Engineering and Knowledge
Management. (1999)
7. Szulman, S., Biébow, B.: Owl et terminae. In: 14-me Journe Francophone d’
Ingnierie des Connaissances. (2004)
8. Faure, D., Nedellec, C.: Asium, learning subcategorization frames and restrictions
of selection. In: 10th European Conference On Machine Learning, Workshop on
text mining, Chemnitz, Germany (1998)
9. Schutz, A., Buitelaar, P.: Relext: A tool for relation extraction from text in ontology
extension. In: International Semantic Web Conference. (2005) 593–606
222 V. Ceausu and S. Desprès
10. Alfonseca, E., Manandhar, S.: Improving an ontology refinement method with
hyponymy patterns. In: Third International Conference on Language Resources
and Evaluation. (2001)
11. Faatz, A., Steinmetz, R.: Ontology enrichment with texts from the www. In:
SemanticWeb Mining 2nd Workshop at ECML/PKDD. (2002)
12. Monge, A., Elkan, C.: An efficient domain-independent algorithm for detecting ap-
proximately duplicate database records. In: Workshop on data mining and knowl-
edge discovery, SIGMOD. (1997)
13. Miller, G.: Wordnet: A lexical database for english. CACM 38 (1995) 39–41
Building Corporate Knowledge Through Ontology
Integration
1
Justice Technology Services, Department of Justice,
Government of South Australia, 30, Wakefield Street, Adelaide, SA 5000, Australia
[email protected]
2 Science Applications International Corporation,
1 Introduction
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 223 – 229, 2006.
© Springer-Verlag Berlin Heidelberg 2006
224 P.H.P. Nguyen and D. Corbett
ontology merging exercises. This would be of particular help for large enterprises
with a frequent requirement for ontology merging and enable the building of
corporate knowledge, defined as the total knowledge acquired by an enterprise in its
business dealings and which we represent under our formalism by the final merged
ontology, resulting from successive mergings of the enterprise’s existing ontologies.
• Canon
In simple terms, a canon is a framework for knowledge organization that models the
real world through abstract concepts, relations and association rules between them.
Formally, we define a canon K as a tuple K = (T, I, ≤, conf, B) where:
(1) T is the union of the set of concept types TC and the set of relation types TR . We
assume that each of those sets is a lattice, in which every pair of types has a
unique supremum and a unique infimum. This assumption ensures mathematical
soundness of the structure and is common in lattice theory [10].
(2) I is the set of individual markers, which are real-life instances of concept types,
e.g. “John” is an individual marker of the concept type “Person”.
(3) ≤ is the subsumption relation in T, enabling definition of a hierarchy of concept
types and a hierarchy of relation types.
(4) conf is the conformity relation that relates each individual marker in I to a concept
type in TC . It in effect defines the (unique) infimum of all concept types that
could be used with an individual marker (called coreferent concept nodes in [1]),
e.g. the individual marker “John” may be associated through conf with the
concept type “Man”, which is the infimum of other concept types “Person”,
“Mammal” and “Living Entity”, and therefore “John” can be used as an instance
of those concepts, i.e., “John is a man, a person, a mammal and a living entity.”
(5) B is the Canonical Basis function that associates each relation type with an
ordered set of concept types that may be used with that relation type, e.g.,
“fatherOf” is a relation type that is associated through the function B with two
(identical) concept types: “Person” and “Person”, in which the first “Person” is
the father and the second “Person” is the child. B must also satisfy the association
rule: if two relation types are related (through the subsumption relation ≤) then
their transformations by B should also be related in the respective order.
• Ontology
• Ontology Integration
Notes
(1) One notable feature from our method is that, as the canon is constantly enriched after
each application of MultiOntoMerge, in subsequent ontology merging exercises,
missing data identified in Step 2 would be less and less significant and manual
intervention in Step 3 would be less and less required. The ontology merging process
therefore becomes more and more automatic after each usage.
(2) With regard to other ontology merging theories and techniques (such as Prompt,
FCA-Merge, Chimaera, etc.), most of them, if not all, are semi-automatic and require
interactions with knowledge engineers and experts during the construction of the
merged ontology. However, unlike our method, none of them explicitly leverages
knowledge gained from earlier ontology merging exercises to improve subsequent
ones. On the other hand, some of them may complement our method by assisting in
the automation of our “validation of missing data” step (i.e., Step 3).
(3) Our method enables discovery of new knowledge during the ontology merging
process (e.g., with the identification of missing concepts and creation of new concepts
in the merged ontology). This is similar to some other techniques, such as Formal
Concept Analysis [10] and Simple Conceptual Graph [1] (see its “lattice-theoretic
interpretation”).
Example
Animal
Animal
Animal
Fig. 3. Merged Ontology (after Semantic Com- Fig. 4. Merged Ontology (after Semantic
paction) Completion)
228 P.H.P. Nguyen and D. Corbett
Animal
Bird Fish-eater
Galah Fish-eater-bird
Pelican Cormorant
3 Conclusion
This paper presents a general domain-independent method to merge ontologies. In
addition to producing the merged ontology, our approach enables knowledge
contained in the input ontologies to be accumulated inside a common canon. It also
ensures that all resulting structures are semantically consistent, compact and
complete, as well as mathematically sound, so that formal reasoning could be
conducted. For an organization with a frequent need for ontology merging, the use of
such a common canon permits a consistent enterprise-wide classification of
knowledge across the diverse business units of the organization. The final merged
ontology represents the total knowledge of the organization and can be leveraged to
improve the organization’s business, e.g., to know which aspects of knowledge the
organization is dealing with and where corporate business strengths reside.
References
1. Chein, M., Mugnier, M.L.: “Concept Types and Coreference in Simple Conceptual Graphs”,
12th International Conference on Conceptual Structures, Huntsville, Alabama, USA (2004)
2. Corbett, D.: “Reasoning and Unification over Conceptual Graphs”, Kluwer Academic
Publishers, New York (2003)
Building Corporate Knowledge Through Ontology Integration 229
3. Corbett, D.: “Filtering and Merging Knowledge Bases: a Formal Approach to Tailoring
Ontologies”, Revue d'Intelligence Artificielle, Special Issue on Tailored Information
Delivery, Cecile Paris and Nathalie Colineau (editors) (Sept./Oct. 2004) 463 – 481
4. Ganter, B., Wille, R.: “Formal Concept Analysis: Mathematical Foundations”, Springer,
Heidelberg, Germany, (1996-German version) (1999-English translation)
5. Nguyen, P., Corbett, D.: "A Basic Mathematical Framework for Conceptual Graphs," IEEE
Transactions on Knowledge and Data Engineering, Vol. 18, No. 2 (Feb. 2006) 261-271
6. Nguyen,P.,MultiOntoMergePrototype: http://users.on.net/~pnguyen/cgi/multiontomerge.pl
7. Nicolas, S., Moulin, B., Mineau, G.: “sesei: a CG-Based Filter for Internet Search Engines”,
11th International Conference on Conceptual Structures, Dresden, Germany (2003)
8. Sowa, J.: “Knowledge Representation: Logical, Philosophical, and Computational
Foundations”, Brooks Cole Publishing Co., Pacific Grove, CA (1999)
9. Stumme, G., Maedche, A.: “FCA-Merge: Bottom-Up Merging of Ontologies”, 7th
International Conference on Artificial Intelligence, Seattle, USA (2001)
10. Wille, R.: “Restructuring Lattice Theory: an Approach based on Hierarchies of Concepts”,
Ordered Sets, I. Rival (ed.), Reidel, Dordrecht-Boston (1982).
Planning with Domain Rules Based on State-Independent
Activation Sets*
1 Introduction
Domain Rules are referred as particular domain knowledge and new knowledge is often
deduced from known knowledge base by applying these rules. Domain rules, or domain
axioms, are always the hotshot problem in AI planning community. Given an operator
file and a problem file, an intelligent planning problem is intended to require a sequence
of actions so that the original state can be transferred into the goal state by applying
these actions in turn. The International Planning Competition (IPC), which is held
biyearly since 1998, is referred as the most top-level academic conference and planning
systems competition in this field. Derived predicate is one of two new features of
PDDL2.2 language [1], the standard competition language in International Planning
Competition 2004 (IPC-4). In classical planning, predicates are divided into two
categories: basic and derived. While basic predicates may appear as effects of actions,
derived ones may only be used in action preconditions or goal state [2]. So, derived
predicates are not affected directly by domain actions, and their truth in the current state
is inferred from that of basic predicates via domain axioms. PDDL2.2 introduces two
benchmark domains containing derived predicates: PSR-Middle, and PROMELA
(Philosophers and Optical-Telegraph) [3]. There were 19 planners that joined the
classical track in IPC-4; however, only four planners (LPG-td, SGPlan, Marvin, and
Downward) attempted to solve those domains containing derived predicates.
*
This research is funded by Chinese Natural Science Foundation (60173039).
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 230 – 237, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Planning with Domain Rules Based on State-Independent Activation Sets 231
Therefore, the lack of the ability of dealing with derived predicates actually blocked
most planners to solve more competition problems.
Some methods have been used to deal with derived predicates in different planning
system; however, their feasibility and efficiency are limited. Compiling them away is
firstly proposed, but is recently proved to involve a worst-case exponential blow-up in
the size of the domain description or in the length of the shortest plans [2]. And Gazen
and Knoblock propose a pre-processing algorithm which transforms domain axioms
into equivalent ‘deduce’ operators, but this may lead to inefficient planning [4, 5]. Also,
LPG-td planner presents an approach to planning with derived predicates where the
search space consists of particular graphs of actions and rules, called rule-action
graphs [5]. However, the calculation of possible activation sets of a derived predicate
in a rule graph is often enormous and boring because activation sets have to be
recalculated as soon as the current state changes. So we attempt to find state-
independent activations sets of a derived predicate, computing them only once in
preprocess phase. We implement our idea on a new planner, called FF-DP (FF-
Derived Predicate), which is a modified version of FF2.3 [6].
In the rest of this paper, we first introduce the definition of state-independent
activation sets of a derived predicate. Then, we discuss how to calculate state-
independent activation sets in a rule-graph and use them in relax-plan. Finally, we
examine FF-DP in some specific benchmark problems.
B
if on(x,y) ∨ ∃z (on(x,z) ∧ above(z,y))
then above(x,y) C D
Given a rule r = (if Φx then P(x)) and a tuple of constants c (|x| = |c|), we can derive
an equivalent set R composed of grounded rules (contain no variables) by applying
the transformations mechanism (more detail in [5]). The set R can only contain basic
facts or derived facts, for instance, on(A,B) and on(B,C) are basic facts, while
above(A,B) and above(B,C) are derived facts. The set R that consists of grounded
232 Z.-h. Jiang and Y.-f. Jiang
rules can be transformed into an AND-OR graph, which LPG-td calls “rule-graph”:
AND-nodes (fact-nodes) are either leaf nodes labeled by basic facts, or nodes labeled
by derived facts; OR-nodes (rule-nodes) are labeled by grounded rules in R. For a
rule-node, its in-edge (only one) is from the derived predicate deduced by itself, and
its out-edges (often more than one) point to triggering conditions of this rule. For
example, in Fig.2, the derived predicate “above(A,B)” can be inspired by rule r1,r2, or
r3, and the triggering conditions of r2 are on(A,C) and above(C,B).
above(A,B)
r1 r2 r3
r1 r1 r1 r1 r1 r1
Algorithm 1
SIAS-search (d, A, Path, Open)
Input: A derived fact (d), the state-independent
activation set under construction (A), the set of AND-
nodes of R on the search tree path from the search tree
root to d (Path), the set of nodes to visit for A (Open);
Output: The set of all state-independent activation sets
(∑).
Fig. 3. (continued)
The algorithm in Fig.3 performs a complete search on the rule graph. The function
first_element gets the first element x in Open table. If x is a rule node, then one
triggering condition of x enters the Open table. If x is a basic-fact node, then it
becomes an element in A immediately. Until a triggering condition is totally supported
by basic facts, the set A becomes an element in ∑. When x is a derived-fact node and
doesn’t emerge in the Path table, the search goes forward recursively by the line 14,
otherwise, it should be pruned off to avoid cycle search. At last, to find all activation
sets, the set A should maintain possible members (line 19~20). For example, we can
get the set ∑SISA of the derived fact “above(A,B)” on the rule graph in Fig.2 by this
algorithm, as follow: {{on(A,B)}, {on(A,C), on(C,B)} , {on(A,C), on(C,D), on(D,B)} ,
{on(A,D), on(D,B)}, {on(A,D), on(D,C), on(C,B)}}. Here, {on(A,B)} is also a
activation set in ∑SISA, however, it belongs to the current state and hence doesn’t
appear in the set ∑LPG. Next, we present a forward-search algorithm (depicted in Fig. 4)
for the relaxed-plan in the domains with derived predicates, where the state contains
not only basic facts, but also plenty of derived facts deduced from domain rules.
Algorithm 2
Extend-relax-plan (I, G)
Input: The initial state (I), the goal state (G);
Output: The set of actions or fail.
1. S ← I ∪ D(I, R);
2. level ← 0;
3. For each action a which is applicable in S Do
4. S’ ← S ∪ Add(a);
5. level ← level + 1;
6. S’’ ← S’ ∪ D(S’, R);
7. If S’’ doesn’t contain G , Then S ← S’’ , GOTO 3
8. Else π = extract-relax-plan(I, G);
9. If π= ∅ Then return fail;
10.Else return Aset(π).
Extract-relax-plan (I, G)
Input: The initial state (I), the goal state (G);
Output: The set of actions in relax-plan (Act).
The algorithm in Fig.4 is composed of two phases: to extend the plan graph and to
extract a plan solution. D(I, R) is the set of derived facts which is the closure of the
set I under the rule set R (more details in [2]). Add(a) is the set of positive effects of
an action node a, and pre(a) is the set of preconditions of an action node a. In the pre-
process phase, we can first calculate the set ∑ for every derived fact d by the function
“SIAS-search (d, ∅, ∅, ∅)” and store it in a lookup table. And in relax-plan phase,
each derived fact is replaced by its best state-independent activation set (SISA). By
the way, a best SISA can be defined as an activation set which has the minimal
actions set that can reach all basic facts from the initial state. By building a Lookup
table to save state-independent activation sets, we can avoid spending a lot of time in
calculating those state-dependent activation sets.
PSR-Middle-DerivedStrips
10000
1000
100
10
sec .
0.1
0.01
0
9
12
15
18
21
24
27
30
33
36
39
42
45
48
Fig. 5. Performance of FF-DP in the domain PSR-Middle
PROMELA-Optical_Telegraph-DerivedADL
10000
1000
100
10
s ec .
1 LPG-td
FF-DP
0.1 Marvin
Downward
0.01
12
15
18
21
24
27
30
33
36
39
42
45
48
0
5 Conclusion
References
[1] S. Edelkamp, and J. Hoffmann. PDDL2.2: The language for the Classic Part of the 4th
International Planning Competition. T.R. no. 195, Institut f¨ur Informatik, Freiburg,
Germany, 2004.
[2] S. Thiebaux, J. Hoffmann, B. Nebel. In Defense of PDDL Axioms. Artificial Intelligence.
Volume 168 (1-2), 2005, pp. 38 - 69.
[3] J. Hoffmann, S. Edelkamp, R. Englert, F. Liporace, S. Thiebaux, and S. Trueg. Towards
Realistic Benchmarks for Planning: the Domains Used in the Classical Part of IPC-4 -
Extended Abstract. Proceedings of the 4th International Planning Competition (IPC-40).
2004, June, Whistler, Canada, pp. 7-14.
[4] M. Davidson, M. Garagnani. Pre-processing planning domains containing Language
Axioms. In Grant, T., & Witteveen, C. (eds.), Proc. of the 21st Workshop of the UK
Planning and Scheduling SIG (PlanSIG-02). pp. 23-34. Delft (NL), Nov. '02.
[5] Alfonso Gerevini, Alessandro Saetti, Ivan Serina, Paolo Toninelli. Planning with Derived
Predicates through Rule-Action Graphs and Relaxed-Plan Heuristics. R.T. 2005-01-40.
[6] Jörg Hoffmann, Bernhard Nebel. The FF Planning System: Fast Plan Generation through
Heuristic Search. Journal of Artificial Intelligence Research 14 (2001), pp.253-302.
Elicitation of Non-functional Requirement Preference for
Actors of Usecase from Domain Model
1 Introduction
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 238 – 243, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Elicitation of Non-functional Requirement Preference 239
have done separately. We have suggested a method of combining both, and extracting
the variants in usecase and combining with domain model and non-functional
taxonomy to derive the actor’s preference. The system architecture is shown in Fig. 1.
State Chart
Actor Events Elicitor
Actor Events
Goal-based NFR Extractor Actor Preference
questionnaires
Non-Functional requirements
for the actors
Usecase and Domain model are structured using the XML editor to the Data Type
Definition (DTD) format. The editor checks the syntax of both the usecase and domain
model with the specified structure. They are represented with specified notations for easy
traceability of the usecase description with the domain model. The syntactic structure of
the usecase description is validated using natural language processing. The various
syntactic structures used in usecase and domain model are listed in the Table 1.
The Usecase-Domain Mapping Wizard extracts the entities, which are not present in
the domain model from the usecase description by mapping the usecase with domain
model. The usecase follows the below structure.
Title: a label that uniquely identifies the use case within the usecase model.
Primary Actor: the actor that initiates the use case.
Participants: other actors participating in the use case.
Goal: primary actor expectation at the successful completion of the use case.
Precondition: condition that must hold before an instance of usecase can be executed.
Postcondition: condition that must be true at the end of a 'successful' execution of an
instance of the use case.
Steps: Sequence of steps involved in the usecase along with extension.
Extensions: a set of step extensions that applies to all the steps in the use case.
240 G.S. Anandha Mala and G.V. Uma
1
An entity consists of one or more words specified as Word1, … wordn. The sequence of words
must correspond to a concept, an attribute of a concept in the domain model, an instance of a
concept, or a reference to an attribute of an instance.
2
conjugated_action_verb is the action_verb used in the concept operation declaration in the
present tense.
3
binding_word may be a possessive adjective, article or preposition.
Elicitation of Non-functional Requirement Preference 241
References
1. Markus Nick, Klaus-Dieter Althoff, Carsten Tautz: Facilitating the Practical Evaluation of
Organizational Memories Using the Goal-Question-Metric Technique. KAW’99 – Twelfth
Workshop on Knowledge Acquisition, Modeling and Management 1999.
2. Jane Cleland-Huang, Raffaella Settimi, Oussama BenKhadra, Eugenia Berezhanskaya,
Selvia Christina: Goal-Centric Traceability for Managing Non-Functional Requirements.
ACM ICSE’05 May 15–21, 2005
3. Annie I. Anton, Colin Potts: The Use of Goals to Surface Requirements for Evolving
Systems. 20th International Conference on Software Engineering (ICSE98), pages 157-
166, April. 1998
4. Luiz Marcio Cysneiros, and Julio Cesar Sampaio do Prado Leite: Nonfunctional
Requirements: From Elicitation to Conceptual Models. IEEE Transactions On Software
Engineering, Vol. 30, No. 5, May 2004
5. Xiaoqing Frank Liu, John Yen: An Analytic Framework for Specifying and Analyzing
Imprecise Requirements. Proceedings of 18th International Conference on Software
Engineering (ICSE-18), Berlin, Germany, pp 60-69, March 25-30, 1996
6. Martin Glinz: Rethinking the Notion of Non-Functional Requirements. Proceedings of the
Third World Congress for Software Quality (3WCSQ 2005), Munich, Germany, Vol. II,
55-64
7. Vittorio Cortellessa, Katerina Goseva-Popstojanova, Ajith R. Guedem, Ahmed Hassan,
Rania Elnaggar, Walid Abdelmoez, Hany H. Ammar: Model-Based Performance Risk
Analysis. IEEE Transactions On Software Engineering, Vol.31, No.1, January 2005
8. Haruhiko Kaiya, Akira Osada, Kenji Kaijiri: Identifying Stakeholders and Their
Preferences about NFR by Comparing Use Case Diagrams of Several Existing Systems.
Proceedings of the 12th IEEE International Requirements Engineering Conference
(RE’04) IEEE 2004
9. John Yen, W. Amos Tiao, and Jianwen Yin: STAR: A Tool for Analyzing Imprecise
Requirements. Proceedings of 1998 IEEE International Conference on Fuzzy Systems
(FUZZ-IEEE '98), Anchorage, Alaska, pp. 863-868, May 4-9, 1998
10. Andreas Gregoriades and Alistair Sutcliffe: Scenario-Based Assessment of Nonfunctional
Requirements. IEEE Transactions On Software Engineering, Vol. 31, No. 5, May 2005
Enhancing Information Retrieval Using Problem Specific
Knowledge
1 Introduction
1
http://www.pubmed.gov
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 244 – 251, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Enhancing Information Retrieval Using Problem Specific Knowledge 245
the document. Unfortunately the current IR systems are not able to effectively use
such problem (task) specific knowledge. They do offer features like searching
keywords in the Title, URL, etc. However, these features do not cover the types of
cases we outlined above.
In general, it is fair to say that the current IR systems are primarily focused on
improving retrieval algorithms, and they pay less attention to the task of properly
acquiring information need of a user in the first place [1, 5, 6, 7]. We believe that
future IR systems should be knowledge-driven where an information retrieval process
is enhanced by properly acquiring information need and problem specific knowledge
from a user.
Most of the current IR systems use very similar (and one could even say
traditional) form based user interfaces [1]. Considering that many users today have
access to powerful machines, we believe more advanced interactive user interfaces
should be explored in order to improve IR experiences. However, there are very few
systems [8, 9] today that try to use such advance user interfaces in order to improve
IR experiences.
Search Engine
(like Yahoo,
Google, etc)
Problem
Specific
Knowledge
(PSK)
Search
Results
+
Attributes based on PSK
In the near future, we plan to extend this list by including more commonly used
attributes from the field of Natural Language Processing (NLP) [2]. However, as
discussed in the next section, we believe the above attributes are powerful enough to
significantly enhance IR experiences for a large number of problems.
The module PSK uses GATE2 to generate (calculate) NLP related attributes. GATE
(General Architecture for Text Engineering) is a widely used and very popular (free)
open source framework (or SDK) that is successfully used for all sorts of language
processing tasks.
Graphical User Interface Module: The problem specific attributes generated by the
module PSK, along with the corresponding documents are sent to the “Graphical User
Interface” module. The aim here is to plot charts based on the values of the problem
specific attributes. For example, we can plot a chart where an X-axis represents first
occurrences of the word “tutorial” and Y-axis represents first occurrences of the word
“prolog”. By plotting such charts, a user can locate documents where these values are
either say very small or very large (depending on the requirements). Alternatively, a
user may not be interested in documents where one value is say small and another say
high, and so on. The module allows a user to quickly plot such charts to view
documents in a variety of ways. Each point on a chart represents a document, and it
has a hyperlink to the corresponding document. In other words, a user can simply
click on a point in a chart and the corresponding document will be displayed in a
browser window. This would allow a user to interactively browse through a large
collection of search results, using problem specific attribute values.
2
http://gate.ac.uk/
248 N. Morioka and A. Mahidadia
Fig. 2. Chart outlining the relationship between two problem specific attributes: prolog-1p (X-
axis) and tutorial-1p (Y-axis)
Fig. 3. Chart outlining the relationship between two problem specific attributes: prolog-tutorial-
1p (X-axis) and prolog-1p (Y-axis)
It should be noted that here the whole process is knowledge-driven. A user defines
problem specific attributes and he/she also selects attributes for charting a graph. This
is different to classical clustering methods (which are data-driven) where documents
Enhancing Information Retrieval Using Problem Specific Knowledge 249
are grouped together using predefined clustering criteria. In this paper we will only
focus on knowledge-driven explorations. Here we simply note that if required, it is
also possible to cluster documents based on their problem specific attributes for data-
driven explorations. The module uses Weka’s Visualisation tool3 to display charts.
Considering that Weka offers tools for popular data mining techniques, it would not
be difficult to also include data-driven approaches in the future. However, in this
paper we want to emphasis knowledge-driven approaches that are often neglected in
IR literature.
4 Experimental Result
In this section we present experimental results that demonstrate how problem specific
attributes could be used for knowledge-driven explorations of retrieved documents.
For the following query, we use Yahoo search engine, retrieve the first 100 results,
and restrict our search to edu.au (to avoid possible advertisement material).
Let’s first continue our previous example where we were interested in searching
for a prolog tutorial. For this task, we created the following attributes:
• Paragraph number where the word “prolog” (ignoring case) appears first
time in a document, let’s call it prolog-1p
• Paragraph number where the word “tutorial” (ignoring case) appears first
time in a document, let’s call it tutorial-1p
• Paragraph number where both the words “prolog” (ignoring case) and
“tutorial” appear first time in the same paragraph, in a document, let’s call
it prolog-tutorial-1p
The form-based graphical user interface allows a user to create the above attributes.
After calculating the above attribute values, the system sends these attribute values
along with document references to Weka’s Visualisation tool (which is appropriately
modified for the system). Initially the system displays thumb nails for possible charts.
Here a user can quickly examine different thumbnails to look for possible interesting
patterns (relationships) between problem specific attributes. By clicking on a
thumbnail, a user can display the corresponding chart. Fig 3 shows such a chart for
the attributes prolog-1p and tutorial-1p. Similarly, Fig 4 shows relationships between
attributes prolog-tutorial-1p and prolog-1p.
Given that we are looking for a prolog tutorial, we might be more interested in
exploring documents that are close to the origins (bottom-left corner) of these charts.
We could also infer other useful information from theses charts. For example, in Fig
3, documents that appear on the diagonal axis or close to the diagonal axis have both
the attributes appearing first in the same paragraph or nearby paragraphs, increasing
the likelihood of them being prolog tutorial. Alternatively we could say that in Fig 3
documents with low x-value (prolog-1p) and high y-value (tutorial-1p) might be
referring to some other material on prolog (like lecture notes on AI). Similarly, we
3
Weka is an open source software for data mining and visualisation, available at
http://www.cs.waikato.ac.nz/ml/weka/
250 N. Morioka and A. Mahidadia
could say that documents with high x-value (prolog-1p) and low y-value (tutorial-1p)
might be referring to tutorials on other topics (again say AI tutorials).
In the charts there are some documents with attribute values 50. In this experiment, if
we cannot find the required term in the first 10,000 words (or there is a parsing
problem), we assign 50 to that attribute. This is to indicate that the corresponding value
is too high for our purposes. We did this to simplify chart displays. Also note that, a
paragraph index starts with 0 and a Yahoo rank starts with 1.
We manually checked all the 100 documents returned by Yahoo! for their relevance
to our task and marked them as a relevant or not. Out of the 100 documents retrieved,
there are 35 (35%) relevant documents for the task. Based on the Yahoo Ranking, 70%
of the top 10 documents are relevant, 65% of the top 20 documents are relevant and
53.3% of the top 30 documents are relevant.
In Fig 2, there are 26 documents on or near the diagonal axis. Out of these 26
documents where an absolute difference between the paragraph indexes is less than 1,
there are 16 relevant documents, that is 61.5% relevant documents. 11 of these 16
relevant documents have Yahoo ranks greater than 30. In other words, these 11
relevant documents would not appear in the first three pages of the search results.
This is useful because a user can browse the top ranking Yahoo hits (say first few
pages) and then use charts to look for more relevant documents, or vice versa.
In Fig 3, there are 20 documents that appear near the origin. In other words, these
are the documents where both the words appear in one paragraph at or near the
beginning of the documents. Out of the 20 such documents, 65% of the documents are
relevant. Out of these 20 relevant documents, there are 5 documents with Yahoo ranks
greater than 30.
Popular search engines like Yahoo do use word proximity as one of their retrieval
criteria. However, the final ranking is based on multiple criteria and hence it is not
easy to identify documents with a specific structure or that satisfy a specific criterion.
The approach presented in this paper enhances a retrieval process by combining
problem specific knowledge with the underlying strength of a given search engine.
In the experiments presented above, the criterion used in Fig 2 is more relaxed than
Fig 3, and therefore we believe there are more relevant hits for Fig 2. This might also
be the reason why there are more relevant documents in Fig 2 with Yahoo ranks
greater than 30.
In summary, we believe that the current IR systems do not consider a role of a user
very seriously in defining information need, and later navigating through possible
solutions. More research is need to design and develop innovative approaches that
keep users in the loop and actively seek more domain (problem) specific knowledge
that could be effectively used in narrowing the search space or improving matching
criteria. The system presented in this paper is just the beginning of a bigger goal of
building a smart interactive information retrieval system.
Enhancing Information Retrieval Using Problem Specific Knowledge 251
References
1. Belkin, N., et al; Evaluating Interactive Information Retrieval Systems: Opportunities and
Challeges. in Conference on Human Factors in Computing Systems (CHI'04). 2004: ACM
Press, New York, USA.
2. Jackson, P. and I. Moulinier; Natural Languare Processing for Online Applications. 2002:
John Benjamins Publishing Company.
3. Ruthven, I.; Re-examining the Potential Effectiveness of Interactive Query Expansion; in
SIGIR 2003: Proceedings of the 26th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval. 2003. Toronto, Canada: ACM.
4. Salton, G. and C. Buckley; Improving retrieval performance by relevance feedback;
Journal of the Americal Society for Information Science, 1990. 41(4): p. 288-297.
5. Shen, X., B. Tan, and C. Zhai; Context-Sensitive Information Retrieval Using Implicit
Feedback; in Proceedings of the 28th annual international ACM SIGIR conference on
Research and development in information retrieval SIGIR '05. 2005. Brazil: ACM Press.
6. Stojanovic, N.; On the Role of a User's Knowledge Gap in an Information Retrieval
Process. in Proceedings of the 3rd International Conference on Knowledge Capture (K-
CAP 2005). 2005. Banff, Alberta, Canada: ACM.
7. Voorhees, E.H. and D. Harman; Overview of the sixth text retrieval conference (TREC-6).
in Information Processing and Management,. 2000.
8. KartOO visual meta search engine [http://www.kartoo.com]
9. Vivisimo’s clustering [http://www.vivisimo.com]
10. Müller HM, Kenny EE, Sternberg PW (2004); Textpresso: An Ontology-Based
Information Retrieval and Extraction System for Biological Literature. PLoS Biol 2(11):
e309
Acquiring Innovation Knowledge
Computing Department,
Division of Information and Communication Sciences,
Macquarie University, Australia
{busch, richards}@ics.mq.edu.au
1 Introduction
Many today would accept that the Western organisation is no longer competitive from
the point of view of secondary industry. Although both primary and tertiary industry
must be conducted onshore, to attain a global advantage at the quaternary and quinary
levels requires innovation. Naturally attaining a competitive advantage is easier said
than done for “innovation is… a significant and complex dimension of learning in
work, involving a mix of rational, intuitive, emotional and social processes embedded
in activities of a particular community of practice” [5, p.123]. We too see innovation
taking place as a process whereby knowledge may be gained either through self
experience over time or by serving in an ‘apprenticeship’ with a more experienced
innovator who may pass some of his or her expertise on. Nevertheless innovation is
not simply a process of trial-and-error rooted in experience, innovation needs to
produce timely and ongoing results “involving a complex mix of tacit knowledge,
implicit learning processes and intuition” [5, p.124). Given the acknowledgement of
the connection between tacit knowledge and innovation knowledge [9], we have
turned our research using work-place scenarios to capture tacit knowledge toward the
capture of related innovation knowledge.
2 The Approach
The approach carries on and extends our previous work [2, 3, 4] with a narrowing of
focus to innovative and creative type knowledge and a change of direction into the
A. Hoffmann et al. (Eds.): PKAW 2006, LNAI 4303, pp. 252 – 257, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Acquiring Innovation Knowledge 253
You work for an internet company whose founder and chief executive routinely
abuses and demoralises people. You and your fellow employees dread coming to work
with this tyrannical executive, but you know that he has a great idea that can be
packaged for a hot initial public offering in the next 12 months.
Do you:
1) Wait until the company goes public and its stock options vest then get out of there
as quickly as possible.
2) Reduce annual leave and join another company. You don't have to take that kind
of abuse.
3) Steal his idea and make some subtle readjustments to make it better then start
your own internet company. With any luck you'll be able to bankrupt him and
make a lot of money in the process.
4) Stay with the company for as long as they'll have you. Company loyalty is always
appreciated, and the executive's ideas have merit even if he is a jerk.
5) Approach the chief executive and tell him firmly but politely that you don't
appreciate his behaviour towards you and the rest of his staff.
6) Don't take his insults lying down, rise to the occasion and return them with
interest.
7) Try to find out what the executive's real problem is. It may turn out to have
nothing to do with you and rather be connected to personal problems. In which
case, you won't feel that you are incompetent at your job.
3 An Evaluation Study
As Information and Communication Technology (ICT) is our area of expertise, we will
initially focus on innovation in this field. To compare novices with expert innovators,
we are using two sample populations. First of all a third year undergraduate
‘management theory’ class of 75 individuals with a median age of 21 forms our novice
population, and secondly approximately a dozen recognized innovators varying from 30
to 80 years of age, who will provide a skilled sample data set to compare against. To be
recognised as an innovator, as opposed to merely claiming to be one, infers a process of
public scrutiny. The individuals we will be approaching will by definition generally fit
within the category of people experienced at what they do. With the incorporation of
biographical information into the first component of the inventory, we hope to find
differences in the answering of the scenarios on the basis of gender, or employment
seniority, LOTE (Language Other Than English), highest formal qualification obtained
and amount of ICT experience. Naturally the last two factors will not be high for the
novice group given the age group we are dealing with.
expect. Analysis of the results revealed that all respondents took the innovation
knowledge inventory seriously and none took a neutral ‘Neither Good nor Bad’ Likert
scale option all the way through the questionnaire. To maintain concentration and
thereby increase data validity, respondents were given only 4 randomly assigned
scenarios along with the biographical component of the questionnaire.
Let us briefly examine the results of the answers for part of the inventory, in this
case for scenario 12. With regard to answer 1 (“Wait until the company goes public
and its stock options vest then get out of there as quickly as possible”), our
respondents were ethically generally ambivalent, hovering around neither good nor
bad, but realistically this option was considered on the whole to be good idea.
With regard to answer 2 (“Reduce annual leave and join another company. You
don't have to take that kind of abuse”), the respondents were ethically positive, but
realistically more negative. In other words whilst this option might seem an okay
thing to do, our respondents felt in practice this was not such a good idea.
Answer 3 for Scenario 12 (“Steal his idea and make some subtle readjustments to
make it better then start your own internet company. With any luck you'll be able to
bankrupt him and make a lot of money in the process”) presents the most interesting
result. There is clearly a very strong skew toward answering this question in the
negative from an ideal or ethical point of view, but our undergraduates feel in practice
this option is not so bad with a small majority actually considering the idea positive in
practice.
With regard to answer 4 (“Stay with the company for as long as they'll have you.
Company loyalty is always appreciated, and the executive's ideas have merit even if
he is a jerk”), our novice population is evenly spread with regard to this situation from
an idealistic point of view. In practice however the novices are inclined toward
considering this option a bad idea.
In answering 5 (“Approach the chief executive and tell him firmly but politely that
you don't appreciate his behaviour towards you and the rest of his staff”), the
undergraduates feel this is a very good idea idyllically speaking. In practice however,
they seem a little more reserved, a small minority even considering this an extremely
bad idea.
Answer 6 (“Don't take his insults lying down, rise to the occasion and return them
with interest”) is taken on the whole negatively by our sample students. What is
interesting is that a larger than usual group of ‘fence sitters’ take a neutral stance
(‘Neither Good nor Bad’) for this question. Only a small minority consider this option
both ideally and in practice to be a good idea.
Finally answer 7 (“Try to find out what the executive's real problem is. It may turn
out to have nothing to do with you and rather be connected to personal problems. In
which case, you won't feel that you are incompetent at your job”) was interesting
insofar as nobody considered this to be an extremely bad idea. People were generally
comfortable with answer 7, and while there were some who took a neutral stance on
the whole this idea was received positively ideally and in practice.
The actual responses of the novices are not of direct interest to us. We are firstly
interested to see if the novices respond like experts, and if not, what is it that the
experts do that is different. Scenario 12 used in this example has been developed from
one of the case studies recorded in [1]. It is interesting to note that option 1 was in fact
what the innovator historically chose, though he comments that this option was not
256 P. Busch and D. Richards
very innovative. Instead he recommends option 3 as the most innovative option. This
is very interesting because our novices revealed a strong tendency toward intellectual
property theft being a bad idea ethically, but starting ones own internet company and
bankrupting the competition being a good one in practice. Clearly our novices and our
expert have very different views.
Remember that an important part of our research using the inventory was
identifying the novelty generation stage a given scenario was at. In the case of
Scenario 12, our management students were somewhat divided with regard to the
Scenario’s innovation development stage. Five students felt the scenario was
focusing on idea generation, with one of these believing the scenario was concerned
with opportunity recognition at the same time. The majority of novices (10 out of 23)
felt scenario 12 was about opportunity recognition. Two out of 23 felt the Scenario
was dealing with the development stage. And finally 5 students felt the scenario was
dealing with the realisation stage of innovation.
References
[1] Bell, G., McNamara, J.F. (1991) McHigh-Tech ventures : the guide for entrepreneurial
success Perseus Books Publishing L.L.C. New York U.S.A.
[2] Busch, P., Richards, D., (2003) “Building and Utilising an IT Tacit Knowledge
Inventory” Proceedings 14th Australasian Conference on Information Systems
(ACIS2003) November 26-28 Perth Australia.
[3] Busch, P. Richards, D. (2004) “Acquisition of articulable tacit knowledge” Proceedings
of the Pacific Knowledge Acquisition Workshop (PKAW'04), in conjunction with The 8th
Pac.Rim Int.l Conf. on AI, Aug 9-13, 2004, Auckland, NZ, :87-101.
[4] Busch, P., Richards, D., (2005) “An Approach to Understand, Capture and Nurture
Creativity and Innovation Knowledge” Proc. 15th Australasian Conference on
Information Systems (ACIS2005) Nov 30-Dec 2nd, Sydney, Australia.
[5] Fenwick, (2003) Innovation: examining workplace learning in new enterprises Journal of
Workplace Learning 15(3):123-132.
[6] Ganter, R., Wille, R., (1999) Formal concept analysis: Mathematical foundations
Springer-Verlag Berlin Germany.
[7] Gough, H. (1981) “Studies of the Myers-Briggs Type Indicator in a Personality
Assessment Research Institute” Fourth National Conference on the Myers-Briggs Type
Indicator, Stanford University, July 1981, CA.
[8] Kirton, M. (2001) “Adaptors and Innovators: why new initiatives get blocked” J. Henry
(ed.) Creative Management 2nd Edition, Cromwell Press Ltd, London, :169-180.
[9] Leonard, D., Sensiper, S., (1998) “The role of tacit knowledge in group innovation”
California Management Review Berkeley; Spring 40(3) (electronic).
[10] Schweizer, T.S. (2004) An Individual Psychology of Novelty-Seeking, Creativity and
Innovation ERIM Ph.D. Series, 48.
[11] Sternberg, R., Wagner, R., Williams, W., Horvath, J., (1995) “Testing common sense”
American psychologist 50(11) :912-927.
Author Index