The Revival of Structural Subsumption in Tableau-Based Description Logic Reasoners
The Revival of Structural Subsumption in Tableau-Based Description Logic Reasoners
1 Introduction
In practical applications involving ontologies, it should be the goal to formulate
important tasks as reasoning problems, and existing description logic inference
systems should be used to actually solve these problems. Due to our experiences,
for specific tasks of this kind, very often expressive description logics such as ALC
or SHIQ are used. For large parts of the application, however, a description logic
such as EL or maybe ELH might be sufficient. For practical applications it is a
realistic assumption that description logic reasoning systems should be tailored
towards Tboxes consisting of very large sets of (acyclic) concept definitions with
concept descriptions from, for instance, the ELH language on the one hand,
and some smaller set of axioms involving ALC or SHIQ concept descriptions
on the other. Due to our experiences, supporting this kind of Tboxes is of ut-
most importance for the acceptance of description logic reasoners in industrial
contexts.
It has been argued in the literature [1–3] that less expressive languages have
their merits for applications using very large Tboxes because in these less expres-
sive languages, subsumption is a polynomial inference problem. Also, recently,
non-obvious results have been achieved that identify subsumption as a polyno-
mial problem even in the presence of generalized concept inclusions (GCIs) for
the description logic EL++ [4–6].
Subsumption is an important inference problem in many application con-
texts, and it is the predominant inference problem used at development time
when a Tbox is classified. Classification of large Tboxes is a well-investigated,
but still not an easy inference problem from a practical point of view. Exploiting
the results in [6], for this task very promising results have been reported on algo-
rithms implemented in the CEL description logic system [7]. In this work, among
?
This paper has been partially funded by the TONES Project, FP6-7603, 6th EU
Framework Programme.
other knowledge bases, the authors consider a huge Tbox, which is a variant of
SNOMED-CT [8] with approximately 380,000 concept names for which, to a
large extent, corresponding concept definitions exist. If an application is to be
built that uses a Tbox of this size, it might be the case that additional concept
definitions using concept descriptions from ALC or even SHIQ should be added
in order to solve certain application tasks using description logic reasoning. How-
ever, then, description logic inference systems tailored to the EL language (and
its extension EL++ ) can no longer be used for classification purposes at the
current state of the art. The problem is that one often cannot have the cake
(description logics systems ensuring fast classification) and eat it too (i.e., use
the expressive power of description logics for solving application problems).
Due to our experiences, for instance, value restrictions are indeed used in
applications, and are supported by W3C syntaxes for description logic languages
(such as OWL Lite or OWL DL). Thus, following our line of argumentation,
in these kinds of applications, SHIQ reasoners such as FaCT++, Pellet, or
RacerPro seem to be appropriate. Experiments have indicated that Tboxes of
the SNOMED family are hard for these reasoners [5].
In this paper we summarize our experiences with optimization techniques
for well-known tableau-based reasoning systems, and analyze the performance
of very simple techniques to cope with Tboxes whose bulk axioms just use a less
expressive language such as ELH, whereas some small parts of the Tbox use
a language as expressive or more expressive as ALC. The techniques analyzed
in this paper have been tested with RacerPro, but they can be embedded into
other tableau-based reasoners such as, e.g., Fact++ or Pellet in a seamless way.
The paper is structured as follows. In Section 3 we first give a deeper intro-
duction to the problems of tableau-based reasoning techniques in the context
of classifying ELH. Then, in Section 4, a technique is specified that fits well
into tableau-based reasoning systems, and, due to our experiments in Section 5,
can solve the classification problem for important example Tboxes of the class
considered in this paper. Section 6 concludes the paper and analyses the pros
and cons of the approach.
(C u D)I = C I ∩ DI
(C t D)I = C I ∪ DI
(¬C)I = ∆I \C I
(∃R.C)I = {x | ∃y.(x, y) ∈ RI and y ∈ C I }
(∀R.C)I = {x | ∀y. if (x, y) ∈ RI then y ∈ C I }
Algorithm 1 pmodels mergable?((L1 , ¬L1 , S1∃ , S1∀ ), (L2 , ¬L2 , S2∃ , S2∀ ))
return (L1 ∩ ¬L2 = ∅) ∧ (¬L1 ∩ L2 = ∅)
∧ not exists (∃R.C) ∈ S1∃ such that
there exists (∀S.D) ∈ S2∀ with S ∈ ancestors(R)
∧ not exists (∃R.C) ∈ S2∃ such that
there exists (∀S.D) ∈ S1∀ with S ∈ ancestors(R)
Algorithm 2 pmodel embeddable?((L1 , {}, S1∃ , {}), (L2 , {}, S2∃ , {}))
return f ilter(L1 ) ⊆ L2
∧ for all (∃R.C) ∈ S1∃
there exists (∃S.D) ∈ S2∃ such that
R ∈ ancestors(S)
∧ pmodel embeddable?(pmodel(C), pmodel(D))
Note that we do not claim that the idea behind Algorithm 2 is new. The al-
gorithm is presented here in order to discuss details in the context of pseudo
models.
The function f ilter is used for the following purpose. Given a Tbox
. .
{A = ∃R.C, A1 v B u ∃R.C, A2 = B u A}
The function meta constraint? checks if GCI absorption was successful, i.e.,
the function tests whether there are GCIS left after GCI absorption. The function
3
We assume that taxonomic encoding [13] is not used.
elh concept? exploits results from a static analysis of the axioms in the Tbox.
For a concept name A it is determined whether all subconcepts “reachable” via
concept definitions or GCIs are ELH concepts. The function self referencing?
determines whether a concept name A is cyclic with respect to the Tbox axioms.
In Algorithm 4 the subsumption test is defined. The function shiq subsumes?
applies the standard subsumption test with optimization techniques as described
in [12–14].
Algorithm 4 subsumes?(A1 , A2 , T )
if pmodel embedding applicable?(A1 , A2 , T ) then
return pmodel embeddable?(pmodel(A1 ), pmodel(A2 ))
else
return shiq subsumes?(A1 , A2 , T )
5 Evaluation
We conducted several experiments to verify the hypothesis that Algorithm 2,
which is worst-case exponential, dramatically speeds up classification times for
SNOMED-ELH even in the case where there are some GCIs that use ALC
concept descriptions.
Table 1 shows the results that we obtained for classifying SNOMED-ELH
with approximately 380,000 concept names using RacerPro 1.9.2 beta (Intel
2.4GHz, Core 2 Duo, Mac OS X, Leopard, 64bit).
Fig. 2. A SNOMED classification test (Intel Pentium IV 2.8 GHz, Linux, 32bit) with
an increasing number of value restrictions added to the definition of randomly selected
parents of ⊥. Only the first 10 % of SNOMED are used for this test (37900 concept
names).
dramatic. A few value restrictions can cause long classification times. The effect
is less dramatic if there are “few” concept names in the Tbox (see Figure 2).
Note that we are aware of the fact that RacerPro cannot even be used in cases
where CEL could be expected to still be quite fast, i.e., if the extensions that
EL++ includes were used in the Tbox. In addition, the optimization techniques
presented in this paper cannot be used in the presence of cycles. The cases in
which the techniques can indeed be used are automatically detected, however.
CEL cannot be run in the spoiled case.
6 Conclusion
The techniques investigated in this paper are attractive for tableau-based de-
scription logic systems because they are very easy to implement. Data structures
being computed anyway (pseudo models) are now also used for model embed-
ding as a kind of structural subsumption test. The proposed technique requires a
static analysis for implementing the applicability check defined in Algorithm 3.
We conjecture that the information needed in Algorithm 3 is already computed
in all optimized tableau-based reasoners existing today.
The evaluation with SNOMED-ELH shows encouraging results. Neverthe-
less, the tests reveal that even a relatively small number of value restrictions
can cause problems with large Tboxes. Nevertheless, we can have the bulk ELH
Tboxes that meet sporadic SHIQ requirements for small parts.
Furthermore, we found it surprising that an exponential algorithm, namely
a (naive) structural subsumption test, can be used to classify Tboxes such as
SNOMED-ELH. With pmodel embeddable? the “right structures” are made
available. The potential problem that there are many non-effective recursive
calls of pmodel embeddable? does not occur in practical Tboxes, and SNOMED-
ELH seems to be a good example for this effect. An implementation of a poly-
nomial algorithm for pmodel embeddable? and its evaluation in the context of
SNOMED-ELH is left for further studies.
The experiments shed some light on the real problem with tableau-based
subsumption tests. The problem is not the exponential worst-case behavior per
se. The worst-case-exponential algorithm for pmodel embeddable? works fine for
SNOMED-ELH. In our opinion, the problem is that in standard tableau reason-
ers the pseudo models are retained, but the tableau structures used for deciding
the problem subsumes?(D, C) = ¬SAT (C u ¬D) are always discarded after the
test is completed. This could be avoided if also tableau structures for C and ¬D
were retained and somehow manipulated in an effective way to get the answer.
We conjecture that it would be possible to also deal with, for instance, value re-
strictions effectively in a tableau-based algorithm if the “right structures” were
kept between multiple calls to subsumes. This is subject to further research as
well.
References
1. Brachman, R.J., Levesque, H.J.: The tractability of subsumption in frame-based
description languages. In: Proc. of the 4th Nat. Conf. on Artificial Intelligence
(AAAI’84). (1984) 34–37
2. Borgida, A., Patel-Schneider, P.F.: A semantics and complete algorithm for sub-
sumption in the CLASSIC description logic. J. of Artificial Intelligence Research
1 (1994) 277–308
3. Baader, F.: Terminological cycles in a description logic with existential restric-
tions. In Gottlob, G., Walsh, T., eds.: Proceedings of the 18th International Joint
Conference on Artificial Intelligence, Morgan Kaufmann (2003) 325–330
4. Brandt, S.: Polynomial time reasoning in a description logic with existential re-
strictions, GCI axioms, and—what else? In de Mantáras, R.L., Saitta, L., eds.:
Proceedings of the 16th European Conference on Artificial Intelligence (ECAI-
2004), IOS Press (2004) 298–302
5. Baader, F., Lutz, C., Suntisrivaraporn, B.: Is tractable reasoning in extensions of
the description logic EL useful in practice? In: Proceedings of the Methods for
Modalities Workshop (M4M-05), Berlin, Germany (2005)
6. Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Proc. of the 19th
Int. Joint Conf. on Artificial Intelligence (IJCAI 2005). (2005) 364–369
7. Baader, F., Lutz, C., Suntisrivaraporn, B.: CEL—a polynomial-time reasoner for
life science ontologies. In Furbach, U., Shankar, N., eds.: Proceedings of the 3rd
International Joint Conference on Automated Reasoning (IJCAR’06). Volume 4130
of Lecture Notes in Artificial Intelligence., Springer-Verlag (2006) 287–291
8. Spackman, K.A., Campbell, K.E., Cote, R.A.: SNOMED RT: A reference termi-
nology for health care. J. of the American Medical Informatics Association (1997)
640–644 Fall Symposium Supplement.
9. Horrocks, I., Sattler, U., Tobies, S.: Reasoning with individuals for the description
logic SHIQ. In McAllester, D., ed.: Proc. of the 17th Int. Conf. on Automated
Deduction (CADE 2000). Volume 1831 of Lecture Notes in Computer Science.,
Springer (2000) 482–496
10. Horrocks, I., Tobies, S.: Reasoning with axioms: Theory and practice. In: Proc. of
the 7th Int. Conf. on the Principles of Knowledge Representation and Reasoning
(KR 2000). (2000) 285–296
11. Spackman, K.A.: Managing clinical terminology hierarchies using algorithmic cal-
culation of subsumption: Experience with snomed-rt. J. of the American Medical
Informatics Association (2000) Fall Symposium Special Issue.
12. Baader, F., Franconi, E., Hollunder, B., Nebel, B., Profitlich, H.J.: An empirical
analysis of optimization techniques for terminological representation systems or:
Making KRIS get a move on. Applied Artificial Intelligence. Special Issue on
Knowledge Base Management 4 (1994) 109–132
13. Horrocks, I.: Optimisation techniques for expressive description logics. Technical
Report UMCS-97-2-1, University of Manchester, Department of Computer Science
(1997)
14. Haarslev, V., Möller, R.: High performance reasoning with very large knowledge
bases: A practical case study. In: Proc. of the 17th Int. Joint Conf. on Artificial
Intelligence (IJCAI 2001). (2001) 161–168
15. Tsarkov, D., Horrocks, I., Patel-Schneider, P.F.: Optimizing terminological rea-
soning for expressive description logics. J. of Automated Reasoning 39 (2007)
277–316
16. Haarslev, V., Möller, R., Turhan, A.Y.: Exploiting pseudo models for TBox and
ABox reasoning in expressive description logics. In: Proc. of the Int. Joint Conf. on
Automated Reasoning (IJCAR 2001). Volume 2083 of Lecture Notes in Artificial
Intelligence., Springer (2001)