Emnlp05 Textinferencegraphmatching
Emnlp05 Textinferencegraphmatching
Table 1: Some Textual Entailment examples. The first last three demonstrate some of the harder instances
Our results are summarized in Table 6 6 As the S. M. Harabagiu, Pasca M. A., and S.J. Mariorano. 2000.
result indicates, the task is particularly hard; all RTE Experiments with open-domain textual question an-
swering. In COLING, pages 292–298.
participants scored between 50% and 60% (Dagan
et al., 2005). Both GM systems perform better than Dan Klein and Christopher D. Manning. 2003. Accurate
either Bag-Of-Words or TF-IDF according to both unlexicalized parsing. In ACL, pages 423–430.
raw accuracy and CWS. Dekang Lin and Patrick Pantel. 2001. Discovery of in-
We also present results on a per-task basis in Ta- ference rules from text. In KDD ’01: Proceedings
ble 7. Interestingly, there is a large variation in per- of the seventh ACM SIGKDD international conference
on Knowledge discovery and data mining, pages 323–
formance depending on the task, suggesting the en- 328, New York, NY, USA. ACM Press.
tailment task may be inherently more difficult than
others. Dan I. Moldovan, Christine Clark, Sanda M. Harabagiu,
and Steven J. Maiorano. 2003. Cogex: A logic prover
for question answering. In HLT-NAACL.
6
CWS (confidence weighted score) represents the aver-
age precision among our most confident predictions. If K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2001.
{c
P1n, . . .1, cn } are our confidence outputs then CWS = Bleu: a method for automatic evaluation of machine
i=1 n
(number of correct predications in c1 , . . . , ci ) translation.
Text Hypothesis True Answer Our answer Conf Comments
A Filipino hostage in Iraq was re- A Filipino hostage was freed True True 0.84 Verb rewrite is handled. Phrasal
leased. in Iraq. ordering does not affect cost.
The government announced last Oil prices drop. False False 0.95 High cost given for substituting
week that it plans to raise oil word for its antonym.
prices.
Shrek 2 rang up $92 million. Shrek 2 earned $92 million. True False 0.59 Collocation “rang up” is not
known to be similar to “earned”.
Sonia Gandhi can be defeated in Sonia Gandhi is defeated by False True 0.77 “can be” does not indicate the
the next elections in India by BJP. BJP. complement event occurs.
Fighters loyal to Moqtada al- Fighters loyal to Moqtada al- False True 0.67 Should recognize non-Location
Sadr shot down a U.S. helicopter Sadr shot down Najaf. cannot be substituted for Loca-
Thursday in the holy city of Najaf. tion.
C and D Technologies announced Datel Acquired C and D False True 0.64 Failed to penalize switch in se-
that it has closed the acquisition technologies. mantic role structure enough
of Datel, Inc.
Table 3: Analysis of results on some RTE examples along with out guesses and confidence probabilities