Recognizing Text Entailment - Tutorial
Recognizing Text Entailment - Tutorial
Darsh Shah
Pratyaksh Sharma
Definition Continued
This definition is very strict it requires
truthfulness of h in all the instances where
t is true
Example
T: Sachin received an award for
batsmanship, from the ICC.
H:The God of Cricket received an award.
Definition Continued
T entails H only when Sachin is Sachin
Tendulkar. This is the more likely situation,
but not true always.
So a modified definition is required
Applied Definition:A text t entails
hypothesis h if human reading t will infer
that h is mostly likely true
Mathematical Definition
Hypothesis h is entailed by text t if
P(h is true|t) > P(h is true)
Where P(h is true | t) is the Entailment
Confidence and can be considered as a
measure of surety of entailment
Entailment Triggers
Semantic phenomena significant to Textual
Entailment
T: Sachin achieved the milestone of 100
centuries in his career.
H: Sachin attained the milestone of 100
centuries in his career.
The two words are synonyms
Generalizations or specializations of
concepts in Text or Hypothesis can affect
entailment
Example
T: Sachin Tendulkar is a cricketer.
H:Sachin Tendulkar is a sportsman.
Here sportsman is a generalization of
sportsman
Information Retrieval
Textual entailment impacts IR in at least
two ways
Notion of relevance bears strong similarity
with that of entailment
Textual entailment can be used to find
affinities between various words, that can
used to compute an extended similarity
between documents and queries
Question Answering
A given text T is retrieved for Question Q
All entities in text T are substituted as
potential answers to obtain candidate
hypothesis H1,H2,...Hn.
We then pick the best entailed Hi for the
given text T to be the answer for the
question Q.
Miscellaneous
Equivalence between two text pairs can be
used by applying textual entailment from
both sides. This is useful for novelty
detection,copying detection etc
Text simplification, substituting complex
phrases by simpler phrases producing
sentences that are grammatically correct
and convey the meaning in a simpler way
Results
We need to calculate an entailment threshold,
above which well declare entailment.
We find the best accuracy giving threshold on
the training set.
With a threshold = 0.55,
We get accuracy = 0.6138 on the RTE2
development set.
Test Cases
T:The Rolling Stones kicked off their latest tour on Sunday
with a concert at Boston's Fenway Park.
H:The Rolling Stones have begun their latest tour with a
concert in Boston.
Yes
Correctly identifies
Test Cases
T:Craig Conway, fired as PeopleSoft's chief executive
officer before the company was bought by Oracle, was in
England last week.
H:Craig Conway works for Oracle.
NO
Fails to identify, calls a yes
Conclusion
Too inaccurate a method
Cant differentiate between a sentence and its
negation in the simplest form, might say that the
entailment is true
Results
We need to calculate an entailment threshold,
above which well declare entailment.
With a threshold = 0.63,
We get accuracy = 0.625 on the RTE2
development set.
Test Case
H:Sunday's earthquake was felt in the southern Indian city
of Madras on the mainland, as well as other parts of south
India. The Naval meteorological office in Port Blair said it
was the second biggest aftershock after the Dec. 26
earthquake.
T:The city of Madras is located in Southern India.
YES
Entails correctly
Test Case
H:ECB spokeswoman, Regina Schueller, declined to
comment on a report in Italy's La Repubblica newspaper
that the ECB council will discuss Mr. Fazio's role in the
takeover fight at its Sept. 15 meeting.
T:Regina Schueller works for Italy's La Repubblica
newspaper.
NO
Entails incorrectly
Observations
Again not dependable for even moderately
complicated sentences
Results
We need to calculate an entailment threshold,
above which well declare entailment.
With a threshold = 0.63,
We get accuracy = 0.6225 on the RTE2
development set
Test Case
H:It is also an acronym that stands for Islamic Resistance
Movement, a militant Islamist Palestinian organization
that opposes the existence of the state of Israel and
favors the creation of an Islamic state in Palestine.
T:The Islamic Resistance Movement is also known as the
Militant Islamic Palestinian Organization.
NO
Fails to entail correctly
BLEU algorithm
For several values of N (typically from 1 to
4), calculate the percentage of n-grams
from the hypothesis which appears in any of
the text.
Combine the marks obtained for each value
of N, as a weighted linear average.
BLEU algorithm
Apply a brevity factor to penalise short texts
(which may have n-grams in common with the
references, but may be incomplete).
Higher the BLEU score, higher the entailment.
Learn a threshold for the bleu score from the
development score.
Test Cases
H:Patricia Amy Messier and Eugene W. Weaver were
married May 28 at St. Clare Roman Catholic Church in
North Palm Beach.
T:Eugene W. Weaver is the husband of Patricia Amy.
Yes
Entails Correctly
Possibly because of very low threshold used. Other
systems fail to predict this
Conclusion
Fails to understand deep semantic relations
of sentence pairs like the previous ones
It can be used as a baseline technique,quick
to evaluate
A Discourse Commitment-Based
Framework for Recognizing Textual
Entailment
New framework for recognizing Textual
Entailment, that depends on the set of
publicly held beliefs known as discourse
commitments- that can be ascribed to the
author of a text or a hypothesis
System Architecture
Commitment Selection
Following Commitment Extraction,a word
alignment technique first introduced in
(Taskar et al., 2005b) was used in order to
select the commitment extracted from t
(henceforth, ct) which represents the best
alignment for each of the commitments
extracted from h (henceforth, ch)
IKOMA
One of the best performing submissions in
RTE-7 (Text Analysis Conference 2011)
Title: A Method for Recognizing Textual
Entailment using Lexical-level and Sentence
Structure-level features
Had the highest F-measure (48.00) on the
dataset. Next best was 45.13
Approach
First, calculate an entailment score based on
lexical-level matching.
Combine it with machine learning based
filtering using various features obtained from
lexical-level, chunk-level and predicate
argument structure-level information.
Approach
Role of filtering: to discard T-H pairs that have
high entailment score but are not actually.
Using higher features than lexical level.
SENNA is used for analyzing POS of words,
word chunks, NER and predicate-argument
structures
freq(t) is frequency of t in
corpus.
R: set of
knowlegde
resources.
Tt and Ht =
set of words
in each T and
H.
Filtering stage
We train a model that classifies T-H pairs
having high LES into false-positive or truepositive.
If the model predicts a T-H pair as falsepositive, then we discard that pair from
entailment T-H pair candidates.
PAS level
Matching ratio for each argument type (A0, A1) in
all corresponding PAS pairs for each semantic
relation of two predicates
Results
Three solver were submitted:
1. IKOMA1: lexical entailment score + filtering
with threshold set empirically
2. IKOMA2: same as IKOMA1 with threshold 0
3. IKOMA3: lexical entailment score only
Results
Phases of n-grams
To match n-grams, MAXSIM goes through a
sequence of three phases: lemma and POS
matching, lemma matching, and bipartite
graph matching
We will illustrate the matching process
using unigrams, then describe the extension
to bigrams and trigrams
Calculation of F-score
References
1. Diana Perez and Enrique Alfonseca,
Application of the Bleu algorithm for
recognising textual entailments
2. Dan Roth, Recognizing Textual Entailment
3. Yee Seng Chan and Hwee Tou Ng, MAXSIM:
An Automatic Metric for Machine
Translation Evaluation Based on Maximum
Similarity
References
4. Masaaki Tsuchida and Kai Ishikawa, IKOMA
at TAC2011: A Method for Recognizing Textual
Entailment using Lexical-level and Sentence
Structure-level features
5. Andrew Hickl and Jeremy Bensley, A
Discourse Commitment-Based Framework for
Recognizing Textual Entailment