Recognizing Textual Entailment With Tree Edit Distance Algorithms
Recognizing Textual Entailment With Tree Edit Distance Algorithms
Recognizing Textual Entailment With Tree Edit Distance Algorithms
Milen Kouylekov and Bernardo Magnini ITC-irst, Centro per la Ricerca Scientica e Tecnologica University of Trento 38050, Povo, Trento, Italy [email protected],[email protected]
Abstract
This paper summarizes ITC-irst participation in the PASCAL challenge on Recognizing Textual Entailment (RTE). Given a pair of texts (the text and the hypothesis), the core of the approach we present is a tree edit distance algorithm applied on the dependency trees of both the text and the hypothesis. If the distance (i.e. the cost of the editing operations) among the two trees is below a certain threshold, empirically estimated on the training data, then we assign an entailment relation between the two texts.
While the language variability problem is well known in Computational Linguistics, a general unifying framework has been proposed only recently in (Dagan and Glickman 2004). In this approach, language variability is addressed by dening the notion of entailment as a relation that holds between two language expressions (i.e. a text T and an hypothesis H) if the meaning of H as interpreted in the context of T, can be inferred from the meaning of T. The entailment relation is directional as the meaning of one expression can entail the meaning of the other, while the opposite may not. For our participation in the Pascal RTE Challenge we designed a system based on the intuition that the probability of an entailment relation between T and H is related to the ability to show that the whole content of H can be mapped into the content of T. The more straightforward the mapping can be established, the more probable is the entailment relation. Since a mapping can be described as the sequence of editing operations needed to transform T into H, where each edit operation has a cost associated with it, we assign an entailment relation if the overall cost of the transformation is below a certain threshold, empirically estimated on the training data. The paper is organized as follows. Section 2 presents the Tree Edit Distance algorithm we have adopted and its application to dependency trees. Section 3 describes the system which participated at the RTE challenge and in Section 4 we present and discuss the results we have obtained.
1 Introduction
The problem of language variability (i.e. the fact that the same information can be expressed with different words and syntactic constructs) has been attracting a lot of interest during the years and it poses signicant issues in front of systems aimed at natural language understanding. The example below shows that recognizing the equivalence of the statements came in power, was prime-minister and stepped in as prime-minister is a challenging problem.
Ivan Kostov came in power in 1997. Ivan Kostov was prime-minister of Bulgaria from 1997 to 2001. Ivan Kostov stepped in as prime-minister 6 months after the December 1996 riots in Bulgaria.
Figure 1: System architecture explicitly delete the children of N as they are going to be either deleted or substituted on a following step.
Substitution: change the label of a node N1 in the source tree into a label of a node N2 of the target tree. Substitution is allowed only if the two nodes share the same part-of-speech. In case of substitution the relation attached to the substituted node is changed with the relation of the new node.
3 System Architecture
The system is composed by the following modules, showed in Figure 1: (i) a text processing module, for the preprocessing of the input T/H pair; (ii) a matching module, which performs the mapping between T and H; (iii) a cost module, which computes the costs of the edit operations. 3.1 Text processing module
Insertion: insert a node from the dependency tree of H into the dependency tree of T. When a node is inserted it is attached with the dependency relation of the source label.
The text processing module creates a syntactic representation of a T/H pair and relies on a sentence splitter and a syntactic parser. For sentence splitting we used the Maximum entropy sentence splitter MXTerm (Ratnaparkhi 1996). For parsing we used Minipar, a principle-based English parser (Lin 1998) which has high processing speed and good precision. 3.2 Matching module
Deletion: delete a node N from the dependency tree of T. When N is deleted all its children are attached to the parent of N. It is not required to
The matching module nds the best sequence of edit operations between the dependency trees obtained from T and H. It implements the edit distance algorithm described in Section 2. The module makes
3.3
Cost module
(1)
The weight of the insertion operation is the idf of the inserted word. The most frequent words (e.g. stop words) have a zero cost of insertion. In the current version of the system we are still not able to implement a good model that estimates the cost of the deletion operation. In order not to punish pairs with short contents of T we set the cost of deletion to 0. To determine the cost of substitution we used a dependency based thesaurus available at http://www.cs.ualberta.ca/lindek/downloads.htm. For each word, the thesaurus lists up to 200 most similar words and their similarities. The cost of a substitution is calculated by the following formula:
0 8 B @ A4 & 0 9 7 5 686 & 421 3 & ( ' )&
(2)
is the word from T that is being rewhere placed by the word from H and is the similarity between and in the thesaurus multiplied by the similarity between the corresponding relations. The similarity between relations is stored in a database of relation similarities obtained by comparing dependency relations from a parsed local corpus. The similarities have values from 1 (very similar) to 0 (not similar). If there is no similarity, the cost of substitution is equal to the cost of inserting the word w2. 3.4 Global Entailment Score
0 B @ A4 &
V W
QR0 U H
The cost module returns the cost of an edit operation between tree nodes. To estimate such cost, we dene a weight of each single word representing its relevance through the inverse document frequency (idf), a measure commonly used in Information Retrieval. If N is the number of documents in a text collection and N is the number of documents of the collection that contain word w then the idf of this word is given by the formula:
$ " %#!
where is the function that calculates the is the cost of insertedit distance cost and ing the entire tree H. A similar approach is presented in (Monz and de Rijke 2001), where the entailment score of two document and is calculated by comparing the sum of the weights (idf) of the terms that appear in both documents to the sum of the weights of all terms in . To dene the threshold that separates the positive from the negative examples we used the training set provided by the task organizers.
Q R0 T H Q R0 T H P
S
Q R0 IGBD& P H F E C Q R0 T H P
requests to the cost module to receive the cost of the edit operations needed to transform T into H.
(3)
$
run 1
CD 0.78 0.89
IE 0.48 0.50
MT 0.50 0.55
QA 0.52 0.49
RC 0.52 0.53
PP 0.52 0.48
IR 0.47 0.51
0.78 0.89
0.53 0.53
0.49 0.53
0.48 0.42
0.54 0.58
0.48 0.43
0.47 0.50
Table 1: ITC-irst results at PASCAL-RTE could be smaller if the same subtree is deleted from T in prior or later stage. The current implementation of the system does not use resources (e.g. WordNet, paraphrases in (Lin and Pantel 2001), entailment patters as acquired in (Szpektor et al. 2004)) that could signicantly wide the application of entailment rules and, consequently, improve performances. We estimated that for about 40% of the the true positive pairs the system could have used entailment rules found in entailment and paraphrasing resources. As an example, the pair 565: T - Sopranos Square: Milan, Italy, home of the famed La Scala opera house, honored soprano Maria Callas on Wednesday when it renamed a new square after the diva. H - La Scala opera house is located in Milan, Italy. could be successfully solved using a paraphrase X is located in Y, pattern such as Y home of X which can be found in (Lin and Pantel 2001). However, in order to use this kind of entailment rules, it would be necessary to extend the single node implementation of tree edit distance to address editing operations among subtrees. Our participation in the RTE challenge served as a rst test of our system. In the future, we plan to expand the system by searching for solutions for the mentioned problems and introducing entailment and paraphrasing resources.
Y ` X
References
Dagan, I., Glickman, O. 2004 Generic applied modeling of language variability In Proceedings of PASCAL Workshop on Learning Methods for Text Understanding and Mining Grenoble Lin, D. 1998. Dependency-based evaluation of MINIPAR. In Proceedings of the Workshop on Evaluation of Parsing Systems at LREC-98. Granada, Spain. Lin, D. and Pantel, P. 2001. Discovery of inference rules for Question Answering. Natural Language Engineering, 7(4), pages 343-360. Monz, C. and de Rijke, M. 2001. Light-Weight Entailment Checking for Computational Semantics. The third workshop on inference in computational semantics (ICoS-3). Punyakanok., V.,Roth, D. and Yih, W., 2004 Mapping Dependencies Trees: An Application to Question Answering Proceedings of AI & Math 2004 Ratnaparkhi, A. 1996 A Maximum Entropy Part-OfSpeech Tagger. In proceeding of the Empirical Methods in Natural Language Processing Conference, May 17-18, 1996 Szpektor I., Tanev H., Dagan I., and Coppola B. 2004 Scaling Web-based Acquisition of Entailment Relations In Proceedings of EMNLP-04 - Empirical Methods in Natural Language Processing, Barcelona, July 2004 K. Zhang K., Shasha D. 1990 Fast algorithm for the unit cost editing distance between trees. Journal of algorithms, vol. 11, p. 1245-1262, December 1990.