Domenske Ontologije I Deo
Domenske Ontologije I Deo
Domenske Ontologije I Deo
137
Domain Ontologies
Matteo Cristani University of Verona, Italy Roberta Cuel University of Verona, Italy
INTRODUCTION
In conceptual modeling we need to consider a general level of abstraction where the domain of interest is formalized in an independent way with respect to the specific application for which the conceptual modeling process is performed. This leads to an integrated approach that takes into account knowledge about a domain and metaknowledge about a methodology. Indeed, knowledge about a domain is represented by a system of concepts and instances that reify the knowledge that is managed within a domain, and the metaknowledge about a methodology is the description of the knowledge deriving from the method used. For instance, when a technology is used to unveil ontologies within a specific domain, the knowledge about the domain is the resulting ontology, and the metaknowledge about a methodology is the description of the method used to construct the ontology. In this article, a novel method for the creation of both upper level and specific domain ontologies, called the bidirectional method for developing ontologies, is described. In particular, it will guide the developer to obtain ontologies resulting from the combination of both top-down and bottom-up approaches. The first one focuses on conceptual modeling through armchair research (philosophical, psychological, sociological aspects) and figures out a formal draft schema. The second approach employs an automatic (or semiautomatic) extraction of categories, taxonomies, partonomies, and dependency graphs in particular from linguistic corpora of documents related to the topics of the domain.
BACKGROUND
Formal ontologies are a popular research topic in several communities, such as knowledge management, knowledge engineering, natural language processing, artificial intelligence (AI), and others (Fensel, 2000). Formal ontology can be defined as the systematic, formal, axiomatic development of the logic of all forms and modes of being (Cocchiarella, 1991). More gener-
ally, we employ the term formal ontology to designate an explicit specification of a shared conceptualization that holds in a particular context. In other words, an ontology provides an explicit conceptualization that describes semantics of data, providing a shared and common understanding of a domain (from an AI perspective, see the definitions of Gruber, 1998, and Jasper & Ushold, 1999). Ontologies are used to manage knowledge within and among communities, to manage and organize corporate knowledge bases, and to negotiate meanings among individuals. Moreover, ontologies are used to share knowledge among people, and heterogeneous and widely spread application systems, such as semantic-Web applications (Schwartz, 2003). They are implied in projects, as conceptual models, to enable content-based access on corporate knowledge memories, knowledge bases, or data warehouses. They are employed to allow agents to understand each other when they need to interact, communicate, and negotiate meanings. Finally, they refer to common information and share a common understanding of their structure. In computer science, knowledge management, knowledge representation, and other fields, several languages and tools exist for helping final users and system developers in creating good and effective ontologies. In particular, various tools help people in manually or semiautomatically creating categories, partonomies, taxonomies, and other organization levels of ontologies. The generally accepted term to designate these tools is ontology editors. Some of them are open source such as Protg-2000, KAON, and SWOOP, and others are commercial suites for knowledge management based on ontology development, such as tools provided by the onto-Knowledge Project (for an in-depth description, see http://protege.stanford.edu, http://kaon.semantic web.org/, http://www.mindswap.org/2004/SWOOP/, http://www.ontoknowledge.org/index.shtml).
Copyright 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
Domain Ontologies
lar, Usholds (2000; who proposed codification in a formal language) methodology and methontology, which constructs an ontology in a sequence of intermediate representations finally translated into the actual object (Fernndez, Gmez-Prez, & Juristo, 1997), are the most representative. Here are short descriptions of some important methodologies: One of the first modules of the foundational ontologies library is the descriptive ontology for linguistic cognitive engineering (DOLCE). DOLCE is an ontology of particulars and refers to cognitive artefacts that depend on human perception, cultural imprints, and social conventions. This ontology derives from armchair research in particular, referring to enduring and durable entities from philosophical literature. The main authors idea is to develop not a monolithic module, but a library of ontologies (WonderWeb Foundation Ontologies Library) that allows agents to understand one another despite enforcing them to interoperate by the adoption of a single ontology (Masolo, Borgo, Gangemi, Guarino, & Oltramari, 2002). Finally, basic functions and relations (according to the methodology introduced by Gangemi, Pisanelli, & Steve, 1998) should be general enough to be applied to multiple domains, be sufficiently intuitive and well studied in the philosophical literature, and hold as soon as their relations are given without mediating additional entities. In Gatius and Rodrguez (1996), the authors developed a three-step process (natural-language interface generator [GISE]) to build a domain ontology: the building and maintenance of general linguistic knowledge, a definition of the application in terms of the conceptual ontology, and a definition of the control structure. It includes the metarules for mapping objects in the domain ontology with those in the task ontology, the metarules for mapping the conceptual ontology onto the linguistic ontology, and those for allowing the generation of the specific interface knowledge sources, mainly the grammar and the lexicon. One of the most famous ontology-design environments is methontology. It tries to define the necessary activities that people carry out when building an ontology (Fernndez et al., 1997). In other words, it is a flow of ontology development for three different processes: management, technology, and support. The ontology-development process is composed of the following steps: projectmanagement activities that include planning, con-
trol, and quality assurance; development-oriented activities that include specification, conceptualization, formalization, and implementation; and activities that include knowledge acquisition, evaluation, integration, and documentation. The authors Lauser, Wildemann, Poulos, Fisseha, Keizer, and Katz (2002) use the multilingual methontology methodology defined by Fernndez et al. (1997), and enrich this one by stressing on specific actions for supporting the creation process for ontology-driven conceptual analysis. The domain ontology is built by using two different knowledge-acquisition approaches: the creation of the core ontology and the derivation of the domain ontology from a thesaurus. The first one is basically comprised of the first three steps of methontology-development activities defining a list of frequent terms and a list of domain-specific documents to analyze. The second one consists of descriptive keywords linked by a basic set of relationships. The goal of this step is to refine an RDFS ontology model to develop a pruned ontology and a list of frequent terms. Toronto Virtual Enterprise (TOVE) is a methodology for ontological engineering that allows the developer to build ontology following these steps: scenarios motivation, ontology requirements definitions, terminology specification, formal description requirements, axiom specification, and completeness theorems (Fox & Gruninger, 1994, 1998). Ontology Development 101 has been developed by authors involved in these ontology-editing environments: Protg-2000, Ontolingua, and Chimaera (Noy & McGuinnes, 2001). They propose a very simple guide, based on iterative design, that helps developers to create an ontology using these tools. The sequence of the steps to develop an ontology are to determine the domain and scope of the ontology, consider reusing existing ontologies (e.g., Ontolingua ontology library, DAML ontology library, UNSPSC, RosettaNet, and DMOZ), enumerate important terms in the ontology, define the classes and the class hierarchy, define the properties of class slots, define the facets of the slots, and create instances. Usholds (2000) methodology uses formal language for building ontologies via a purely manual process, identifying purpose and scope, capturing (the identification of key concepts and relationships, and the provision of definitions), and finally coding ontology (committing to the basic terms
138
Domain Ontologies
for ontology), integrating existing ontologies, evaluating, and documenting the ontology processes. The On-to-Knowledge (OTK) methodology focuses on application-driven development of ontology during the introduction of ontology-based knowledge-management systems (Fensel, van Harmelen, Klein, & Akkermans, 2000; Lau & Sure, 2002; Sure, Erdmann, Angele, Staab, Studer, & Wenke, 2002). It is based on the following steps: a feasibility study, an impacts and improvements study for the selected target solution, a kickoff phase, a refinement phase, a formalization phase, an evaluation phase, and an application and evolution phase. This methodology stresses the need for ensuring organizational acceptance and the integration of knowledge systems. Then it is based on bottom-up strategies, and gathering insights into the interrelationships between the business task, actors involved, and the use of knowledge for successful performance. The authors Izumy and Yamaguchi (2002) have used the business-object ontology to develop an ontology for business coordination. They constructed the business-activity thesaurus by employing WordNet as a general lexical repository. They have constructed the business-object ontology in the following way: by concentrating on the case-study models of e-business and extracting the taxonomy, counting the number of the appearances of each noun concept, comparing the noun hierarchy of WordNet and the taxonomy obtained and adding the number counted for the similar concepts, choosing the main concept with high scores as upper concepts and building upper ontologies by giving all the nouns the formal is-a relation, and merging all the noun hierarchies extracted from the whole process.
totype models (represented by methontology). Both approaches have benefits and drawbacks: The first one seems more appropriate when the purposes and requirements of the ontology are clear, and the second one is more useful when the environment is dynamic and difficult to understand. Finally, both the informal description of the ontology and the formal embodiment in an ontology language are often developed in separate stages, and this separation increases the gap between real-world models and executable systems. There is no one correct way to model a domain; there are always viable alternatives. Most of the time, the best solution depends on the application that the developer has in mind, and the tools that he or she uses to develop the ontology. In particular, we can notice that the need for correspondence between existing methodologies and environments for building ontologies causes these consequences: Conceptual models are implicit in the implementation codes and a reengineering process is usually required to make the conceptual models explicit, ontological commitments and design criteria are implicit in the ontology code, and ontology developer preferences in a given language condition the implementation of the acquired knowledge.
Domain Ontologies
represent, in a neutral way, the real world. In fact, in the real world or in practical applications (e.g., information systems, knowledge-management systems, portals, and other ICT applications), general and universal categories are not widely being used. This is also due to the difficulties in implementing a general ontology within specific domains. Moreover, general and universal categories are very abstract and can lead to heterogeneous interpretations and different conceptualizations. For instance, everyone has a different interpretation and conceptualization of love, trust, or spatial-temporal regions. Besides this, the more a concept is abstract, the more it is difficult to define it. Then, workers very stressed by their daily activities might find it difficult or useless to make their daily used concepts more abstract and decontextualized. Namely, they might prefer to achieve, in short time, an effective agreement on shared spaces in their office than stay days and days talking about space regions. More often, it is simply too expensive to create complex, complete, and general ontologies. Another important justification of the above-mentioned lack of general and supposedly complete ontologies in realworld applications is that, in the same project or domain, people might use different ontologies composed by several combinations of categories. Indeed, different ontologies might use different categories or systems of categories to describe the same kinds of entities. Even worse, two ontologies may use the same names or systems of categories for different kinds of entities. In fact, when trying to measure the similarity between two ontologies, it is necessary to pursue at both the lexical layer and the conceptual layer (Maedche & Staab, 2002). Therefore, it might be that two entities with different definitions are intended to be the same, but the task of proving that they are indeed the same may be difficult, if not impossible (see Sowa, 2000). The basic reason for these behaviours is that what we know cannot be viewed simply as a unique picture of the world since it always presupposes some degree of interpretation. Indeed, depending on different interpretation schemas, people (with different perspectives, aims, and world interpretations) may use the same categories with different meanings, or different words to mean the same thing. For example, two groups of people may observe the same phenomenon, but still see different problems, different opportunities, and different challenges. This essential feature of knowledge was studied from different perspectives, and the interpretation schemas have been given various names, for example, paradigms in Kuhn (1979), frames in Goffman (1974), thought worlds in Dougherty (1992), contexts in Ghidini and Giunchiglia (2001), mental spaces in Fauconnier (1985), and cognitive paths in Weick (1979). This view, in which the
140
explicit part of what we know gets its meaning from an (typically implicit, or taken for granted) interpretation schema, leads to some important consequences regarding the adoption and the use of categories and ontologies. An ontology is not a neutral organization of categories, but it is the emergence of some interpretation schema according to which it makes sense to organize and define things. In summary, an ontology is always the result of a sense-making process (conceptual modeling) and represents the point of view (the knowledge representation) of those who took part in that process (see Benerecetti, Bouquet, & Ghidini, 2000, for an indepth discussion of the dimensions along which any representation, including an ontology, can vary depending on contextual factors). Moreover, according to a structuration approach (for an in-depth discussion, see, for example, Giddens, 1984; Orlikowski, 1992; Orlikowski & Gash, 1994), technology cannot be considered as a neutral matter with respect to organizational structures and the managing of knowledge. Ontologies can shape knowledge sharing and managing processes, and organizational behaviours can affect the concrete appropriation of technology. Therefore, there is no one correct way to model a domain; there are always viable alternatives. Mainly, the best solution depends on the application that the developer has in mind, the system of artefacts that she or he wants to integrate with the ontology, and the tools that she or he uses to develop the ontology. Indeed, most of the tools only give support for designing and implementing the ontologies, but they do not support all the activities of the ontology life cycle. Besides this, most of the existing methodologies for building ontologies depend on their environments. Therefore, conceptual models are implicit in the implementation codes and a reengineering process is usually required to make the conceptual models explicit. Ontological commitments and design criteria are implicit in the ontology code, and ontology developer preferences, in a given language, condition the implementation of the acquired knowledge (Gruber, 1998).
Domain Ontologies
the very effective development of a top-level ontology. Taking into consideration these aspects, it seems useful to consider that manually constructed ontologies are time consuming, labour intensive, and error prone (Ding & Foo, 2002), but they are necessary to define a domain in which the quality and the general comprehension of the ontology are good. For example, experts might provide the system with a small number of seed words that represent high-level concepts. These concepts can emerge from theoretical ideas and knowledge or from the practice and the experiences of specialized workers. The steps defined by the bidirectional method for developing ontologies shortly are the following: Plan phase: The goals, amount of resources needed for the ontology development, and bonds (e.g., languages, timing, computational power, type of software used to describe the ontology) are defined. It is important to notice that there is a trade-off between the computational complexity (which is domain independent) and the expressive potential defined by the language. Introspective phase: The draft schema, such as the general specifications, categories, and relations; its formalization into the chosen formal language; and its demonstration are defined. It is important to notice that this phase is based on references to literature (philosophical, linguistic, psychological, sociological literature) and on armchair research. Bottom-up phase: The draft terminology is automatically or semiautomatically generated, the description of relations among terms is extracted, and the refinement of draft terminology is handled. The lexical analysis is developed partly in an auto-
mated way (through the extraction of phrases containing seed words in documents, archives, and so on) and partly experienced (through expert discussion; domain experts can help the developer to refine the draft terminology). Notice that this phase is based on a very neat domain knowledge and on semiautomatic ontology generation, which depends, in particular, on data-mining processes, syntax systems of analysis, and so on. Provision of basic axioms: A set of ontology definitions is obtained through domain-expert interviews or participation. Validation phase: The set of definitions is tested, validated, and used.
The above-deployed analysis gives to the bidirectional method for developing ontologies the meaning of a metamethodology, namely, a methodology for operating the right choice among different possible methodologies. This is practically useful in ontology constructions within complex organizations. In fact, within big organizations, knowledge is managed according to different perspectives, and specialized knowledge is managed in the way that better suits specific needs. The presented bidirectional method for developing ontologies sustains the creation of very specialized, specific, and different domain ontologies, allowing a high level of flexibility in ontology-construction processes. Moreover, it allows one to manage a complicated ontology commitment that in practice is routed in dynamic contents and in specific methods. Contents and metaknowledge for ontology constructions can be managed and modified only at execution time, namely, at the moment in which the ontology is created. In Figure 1, a schematic analysis of the phases is described. Each
plan phase
introspective phase
bottom up phase
provision of axioms
validation phase
141
Domain Ontologies
phase is related in terms of the direct dependency on previous phases. The metaknowledge included in the ontology methodology adopted is rendered explicit by the definition of these dependencies. Note that the resulting phase set is minimal with respect to the possible phases in existing methodologies, and that the execution of the phase sequence is cyclic in order to provide a model for reusability.
positories in a corporate knowledge-management system, which may include a data warehouse, a corporate Web portal, and intranet tools for accessing distributed data. The major value of a systematic definition of these aspects is the opportunity for measuring the quality of an ontology from a social and organizational point of view. Though the aim of this investigation is not so far to obtain metrics and evaluation methods for ontologies, we maintain that such a result is going to be shortly available once the methods for building ontologies have been defined. An important observation is that we have three different situations for ontology development at different levels of difficulty: development from scratch, development as a completion of an existing ontology, and development as a merge (or coordination or alignment) of several ontologies. The three cases require different methodologies, and the methodology we have deployed in this article is valid only for the first case. The other two cases are also interesting extensions of the perspectives of knowledge management we consider as the focus of the article, and they deserve deep analysis. However, we believe that this is possible only when using a flexible methodology for the case from scratch.
FUTURE TRENDS
This article discusses a methodology for building a domain ontology from scratch. The intention of the investigation is to prove that extracting an ontology from a corpus (or from many corpora) is a tenable solution for certain paths, and the opposite way based upon a deep thinking on the topics of the domain is acceptable in other cases. The methodology we propose here is able to help
GISE
Top-Down Approach
BusinessObject Ontology
Nave Approaches
101
OTK
TOVE
High
142
Domain Ontologies
the developer in discriminating between the two cases and provides a general schema for the mentioned purpose. The investigation on the cases of the development of ontologies as an integration of existing models and as a merge has to be deepened. The other aspects that have to be covered are how this methodology can affect the updating processes, and how it can allow the integration of domain ontologies into an upper level ontology. Finally, the evaluation and measure of ontologies seem interesting issues in order to quantify how much an ontology costs and how one can improve the daily activities of workers. In particular, taking into account organizational and management studies, an ontology should be evaluated in terms of its ability to satisfy and its effectiveness. The above-discussed needs and solutions can be deployed in technologies for knowledge management both as CASE (computer-aided software engineering) in the context of ontology creation, where the tool helps the developer in doing the right thing at the right moment, and as knowledge-sharing and -meaning negotiation tools, especially in network systems.
REFERENCES
Benerecetti, M., Bouquet, P., & Ghidini, C. (2000). Contextual reasoning distilled. Journal of Theoretical and Experimental Artificial Intelligence, 12(3), 279-305. Cocchiarella, N. (1991). Formal ontology. In B. Smith & H. Burkhardt (Eds.), Handbook of metaphysics and ontology (pp. 640-647). Munich, Germany: Philosophia Verlag. Ding, Y., & Foo, S. (2002). Ontology research and development: Part 1. A review of ontology generation. Journal of Information Science, 3(28), 123-136. Dougherty, D. (1992). Interpretative barriers to successful product innovation in large firms. Organization Science, 3(2). Fauconnier, G. (1985). Mental spaces: Aspects of meaning construction in natural language. Cambridge, MA: MIT Press. Fensel, D. (2000). Ontologies: A silver bullet for knowledge management and electronic commerce. Berlin: Springer-Verlag. Fensel, D., van Harmelen, M., Klein, M, & Akkermans, H. (2000). On-to-knowledge: Ontology-based tools for knowledge management. Proceedings of eBusiness and eWork, Madrid, Spain. Fernndez, M., Gmez-Prez, A., & Juristo, N. (1997). METHONTOLOGY: From ontological art towards ontological engineering. Working Notes of the AAAI Spring Symposium on Ontological Engineering. Fox, M. S., & Gruninger, M. (1994). Ontologies for enterprise integration. Proceedings of the 2nd Conference on Cooperative Information Systems, Toronto, Canada. Fox, M. S., & Gruninger, M. (1998). Enterprise modelling. AI Magazine, 109-121. Gangemi, A., Pisanelli, D. M., & Steve, G. (1998). Ontology integration: Experiences with medical terminologies. In N. Guarino (Ed.), Formal ontology in information systems (pp. 163-178). Amsterdam: IOS Press. Gatius, M., & Rodrguez, H. (1996). A domain-restricted task-guided natural language interface generator. Proceedings of the 2nd Edition of the Workshop Flexible Query Answering Systems (FQAS96). Ghidini, C., & Giunchiglia, F. (2001). Local models semantics, or contextual reasoning = locality + compatibility. Artificial Intelligence, 127(2), 221-259.
CONCLUSION
The major claim of this article is that an explicit representation of the methodological knowledge employed to provide conceptual analysis and express the model of knowledge by means of formal ontologies is a valuable plus for knowledge management. In the article, the existing major methodologies are described, and it is shown that everyone provides a framework for making tenable decisions upon the correct case to be used in each specific case, depending both upon the knowledge type and the domain. Although a lot of different methodologies for ontology creation are used in different domains, a good methodological approach should not change depending on the domain in which it is applied and on the type of technology that is used (see Figure 2). A metamethodology is needed, which allows the developer to use the same metamethodology even if the domain, the tool for ontology creation, the needs, and so on change during the time. In particular, both bottom-up and top-down approaches are very important and are both used in different stages of ontology creation. The bidirectional method for developing ontologies gives an explicit answer to the need of merging both approaches, accounting for the need of tenable, if not optimal, trade-offs between the stability of the model of knowledge and the dynamism of the knowledge itself, which is the actual reason for which an explicit methodology is invented.
143