0% found this document useful (0 votes)
251 views

Recognizing Text Entailment - Tutorial

Textual entailment (TE) in natural language processing is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text. In the TE framework, the entailing and entailed texts are termed text (t) and hypothesis (h), respectively.

Uploaded by

Pratyaksh Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
251 views

Recognizing Text Entailment - Tutorial

Textual entailment (TE) in natural language processing is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text. In the TE framework, the entailing and entailed texts are termed text (t) and hypothesis (h), respectively.

Uploaded by

Pratyaksh Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Text Entailment

Darsh Shah
Pratyaksh Sharma

Introduction to Textual Entailment


Textual Entailment can be defined as the
phenomenon of inferring a text from
another
A text t entails hypothesis h if h is true in
every circumstance of possible world in
which t is true

Definition Continued
This definition is very strict it requires
truthfulness of h in all the instances where
t is true
Example
T: Sachin received an award for
batsmanship, from the ICC.
H:The God of Cricket received an award.

Definition Continued
T entails H only when Sachin is Sachin
Tendulkar. This is the more likely situation,
but not true always.
So a modified definition is required
Applied Definition:A text t entails
hypothesis h if human reading t will infer
that h is mostly likely true

Mathematical Definition
Hypothesis h is entailed by text t if
P(h is true|t) > P(h is true)
Where P(h is true | t) is the Entailment
Confidence and can be considered as a
measure of surety of entailment

Entailment Triggers
Semantic phenomena significant to Textual
Entailment
T: Sachin achieved the milestone of 100
centuries in his career.
H: Sachin attained the milestone of 100
centuries in his career.
The two words are synonyms

Generalizations or specializations of
concepts in Text or Hypothesis can affect
entailment
Example
T: Sachin Tendulkar is a cricketer.
H:Sachin Tendulkar is a sportsman.
Here sportsman is a generalization of
sportsman

Other triggers are of the form Verb


Entailment, entailment through change of
Quantifiers, could trigger entailment
Polarity, factivity, implicative verbs,
iteratives could also lead to entailment

Applications of Textual Entailment


Many natural language processing
applications, like Question Answering(QA),
Information Extraction(IE), (multidocument) summarization and machine
translation (MT) evaluation

Information Retrieval
Textual entailment impacts IR in at least
two ways
Notion of relevance bears strong similarity
with that of entailment
Textual entailment can be used to find
affinities between various words, that can
used to compute an extended similarity
between documents and queries

Question Answering
A given text T is retrieved for Question Q
All entities in text T are substituted as
potential answers to obtain candidate
hypothesis H1,H2,...Hn.
We then pick the best entailed Hi for the
given text T to be the answer for the
question Q.

Machine Translation Evaluation


Machine Translation evaluation involves
comparing the machine translated sentence
with the reference output
Textual entailment helps in this case as it
gives a measure of similarity of the
information conveyed by the reference and
machine output

Miscellaneous
Equivalence between two text pairs can be
used by applying textual entailment from
both sides. This is useful for novelty
detection,copying detection etc
Text simplification, substituting complex
phrases by simpler phrases producing
sentences that are grammatically correct
and convey the meaning in a simpler way

Some basic approaches Implemented


Plain word matching
1. Calculate matching words between the text
and the hypothesis
2. Score = (# matching_words)/(# words in
hypothesis)

Results
We need to calculate an entailment threshold,
above which well declare entailment.
We find the best accuracy giving threshold on
the training set.
With a threshold = 0.55,
We get accuracy = 0.6138 on the RTE2
development set.

Test Cases
T:The Rolling Stones kicked off their latest tour on Sunday
with a concert at Boston's Fenway Park.
H:The Rolling Stones have begun their latest tour with a
concert in Boston.

Yes
Correctly identifies

Test Cases
T:Craig Conway, fired as PeopleSoft's chief executive
officer before the company was bought by Oracle, was in
England last week.
H:Craig Conway works for Oracle.

NO
Fails to identify, calls a yes

Conclusion
Too inaccurate a method
Cant differentiate between a sentence and its
negation in the simplest form, might say that the
entailment is true

Some basic approaches


Plain lemma matching
1. Lemmatize the text and the hypothesis
2. Calculate matching lemmas between the
two
3. Score = (# matching_lemmas)/(# lemmas in
hypothesis)

Results
We need to calculate an entailment threshold,
above which well declare entailment.
With a threshold = 0.63,
We get accuracy = 0.625 on the RTE2
development set.

Test Case
H:Sunday's earthquake was felt in the southern Indian city
of Madras on the mainland, as well as other parts of south
India. The Naval meteorological office in Port Blair said it
was the second biggest aftershock after the Dec. 26
earthquake.
T:The city of Madras is located in Southern India.
YES
Entails correctly

Test Case
H:ECB spokeswoman, Regina Schueller, declined to
comment on a report in Italy's La Repubblica newspaper
that the ECB council will discuss Mr. Fazio's role in the
takeover fight at its Sept. 15 meeting.
T:Regina Schueller works for Italy's La Repubblica
newspaper.
NO
Entails incorrectly

Observations
Again not dependable for even moderately
complicated sentences

Some basic approaches


Lemma + POS matching
1. Lemmatize the text and the hypothesis
2. Label with POS tags
3. Calculate number of matching (lemma,
POS_tag) between the two
4. Score = (# matches)/(# lemmas in
hypothesis)

Results
We need to calculate an entailment threshold,
above which well declare entailment.
With a threshold = 0.63,
We get accuracy = 0.6225 on the RTE2
development set

Test Case
H:It is also an acronym that stands for Islamic Resistance
Movement, a militant Islamist Palestinian organization
that opposes the existence of the state of Israel and
favors the creation of an Islamic state in Palestine.
T:The Islamic Resistance Movement is also known as the
Militant Islamic Palestinian Organization.
NO
Fails to entail correctly

Some basic approaches


Using the BLEU algorithm
Basically, the algorithm looks for n-gram
coincidences between a candidate text.
It can be used as a basic lexical level
benchmark for other textual entailment
methods

BLEU algorithm
For several values of N (typically from 1 to
4), calculate the percentage of n-grams
from the hypothesis which appears in any of
the text.
Combine the marks obtained for each value
of N, as a weighted linear average.

BLEU algorithm
Apply a brevity factor to penalise short texts
(which may have n-grams in common with the
references, but may be incomplete).
Higher the BLEU score, higher the entailment.
Learn a threshold for the bleu score from the
development score.

Results from BLEU


Learned threshold = 0.0585
Which means only 5.85% of n-gram matches and
we declare entailment!
Still, accuracy on RTE2 development set with this
parameter = 0.6050

Test Cases
H:Patricia Amy Messier and Eugene W. Weaver were
married May 28 at St. Clare Roman Catholic Church in
North Palm Beach.
T:Eugene W. Weaver is the husband of Patricia Amy.
Yes
Entails Correctly
Possibly because of very low threshold used. Other
systems fail to predict this

Conclusion
Fails to understand deep semantic relations
of sentence pairs like the previous ones
It can be used as a baseline technique,quick
to evaluate

A Discourse Commitment-Based
Framework for Recognizing Textual
Entailment
New framework for recognizing Textual
Entailment, that depends on the set of
publicly held beliefs known as discourse
commitments- that can be ascribed to the
author of a text or a hypothesis

Inspiration for the approach


Shallow approaches had been moderately
successful in the previous 2 RTEs
These approaches would fail as the
sentences became larger and more
syntactically complex

Formal Definition of the Problem


Given a commitment set {ct} consisting of the
set of discourse commitments inferable from a
text t and a hypothesis h, define the task of
RTE as a search for the commitment c {ct}
which maximizes the likelihood that t
textually entails h

System Architecture

Extracting Discourse Commitments


After preprocessing, some heuristics are
used to extract discourse commitments
Sentence Segmentation,Syntactic
Decomposition,Supplementary Expressions,
Relational Extraction, Coreference
Resolution

Commitment Selection
Following Commitment Extraction,a word
alignment technique first introduced in
(Taskar et al., 2005b) was used in order to
select the commitment extracted from t
(henceforth, ct) which represents the best
alignment for each of the commitments
extracted from h (henceforth, ch)

The alignment of two discourse


commitments can be cast as a maximum
weighted matching problem in which each
pair of words (ti ,hj ) in an commitment
pair (ct ,ch) is assigned a score sij (t, h)
corresponding to the likelihood that ti is
aligned to hj

In order to compute a set of parameters w


which maximize the number of correct
alignment predictions (y) in a given training
set (x)

Features used in the model


string features (including Levenshtein edit
distance, string equality, and stemmed
string equality)
lexico-semantic features (including
WordNet Similarity and named entity
similarity equality)
word association features

Following alignment,the method uses the


sum of the edge scores
Search for ct that represents the reciprocal
best hit
That is,selecting a commitment pair (ct ,
ch) where ct was the top scoring alignment
candidate for ch and ch was the top-scoring
alignment candidate for ct

Entailment and Results


Textual entailment selection is done based
on the decision tree shown in the system
architecture
The following shows the results on RTE-3
test dataset

IKOMA
One of the best performing submissions in
RTE-7 (Text Analysis Conference 2011)
Title: A Method for Recognizing Textual
Entailment using Lexical-level and Sentence
Structure-level features
Had the highest F-measure (48.00) on the
dataset. Next best was 45.13

Approach
First, calculate an entailment score based on
lexical-level matching.
Combine it with machine learning based
filtering using various features obtained from
lexical-level, chunk-level and predicate
argument structure-level information.

Approach
Role of filtering: to discard T-H pairs that have
high entailment score but are not actually.
Using higher features than lexical level.
SENNA is used for analyzing POS of words,
word chunks, NER and predicate-argument
structures

Knowledge resources used


Acronyms extracted from the corpus:
created for organizational names with more
than three words.
WordNet
CatVar: contains categorical variations of
English lexemes.

Lexical Entailment Score

freq(t) is frequency of t in
corpus.

R: set of
knowlegde
resources.
Tt and Ht =
set of words
in each T and
H.

Lexical Entailment Score


match(t, Tt, R) takes 1 if word t corresponds
to a word in Tt (also consider synonyms and
derived words from R); otherwise match()
takes the value 0.

Lexical Entailment Score


The Lexical Entailment Score is calculated for
all H-T pairs in the development set and a
threshold is chosen which gives the highest
micro-average F-measure.
Experiments are also done to find the
optimum value of in equation (1).
By testing: we find = 1.8 to be optimal

Filtering stage
We train a model that classifies T-H pairs
having high LES into false-positive or truepositive.
If the model predicts a T-H pair as falsepositive, then we discard that pair from
entailment T-H pair candidates.

Features for classifier


The lib-svm package is used, with features
like:
lexical-level:
Entailment Score ent_sc
Cosine similarity
Entailment score, comparing only words with same
POS tag

Features for classifier


Chunk level
Matching ratios for each chunk types (e.g. NP and
VP) in all corresponding chunk pairs

PAS level
Matching ratio for each argument type (A0, A1) in
all corresponding PAS pairs for each semantic
relation of two predicates

Features for classifier


Chunk level

Matching ratios for each chunk types (e.g. NP and


VP) in all corresponding chunk pairs
PAS level: For all corresponding PAS pairs:
Matching ratio for each argument type (A0, A1)
Number of negation mismatch
Number of modal verb mismatch
Semantic relation of two predicates can be: sameexpression, synonym, antonym, entailment.

Features for classifier


Chunk level

Matching ratios for each chunk types (e.g. NP and


VP) in all corresponding chunk pairs
PAS level: For all corresponding PAS pairs:
Matching ratio for each argument type (A0, A1)
Semantic relation of two predicates can be: sameexpression, synonym, antonym, entailment or norelation

Computing the features


For acquiring the above features in chunk
and PAS level, we need to detect
corresponding pairs that should be checked
for testing whether the pairs have
entailment.
Also need to detect whether such
corresponding pairs are in entailment
relation.

For the first problem


1. Transform all words contained in PAS into a
word vector using bag of words
representation
2. Calculate the cosine similarity for all PAS
pairs that are generated by combining PAS
from each T and H.
3. We regard the most similar PAS from T for
each PAS from H as corresponding pairs.

For the latter problem


1. For each corresponding pair, we calculate
our lexical entailment score between the
words of each argument type of the PAS
from H (as H in equation 1) and the words
of the same argument type of the PAS from
T (as T in equation 1)
2. Apply a threshold (pre-defined) to identify
entailment

Results
Three solver were submitted:
1. IKOMA1: lexical entailment score + filtering
with threshold set empirically
2. IKOMA2: same as IKOMA1 with threshold 0
3. IKOMA3: lexical entailment score only

Results

MaxSim: An automatic metric for


Machine Translation Evaluation based
on maximum similarity
The metric calculates a similarity score
between a pair of English system-reference
sentences by comparing information items
such as n-grams across the sentence pair

Unlike most metrics, MAXSIM computes a


similarity score between items
Then find a maximum weight matching
between the items such that each item in
one sentence is mapped to at most one
item in the other sentence
Evaluation on the WMT07, WMT08, and
MT06 datasets show that MAXSIM achieves
good correlations with human judgment

Given a pair of English sentences to be


compared, MaxSim performs tokenization,
lemmatization using Word-Net and Part of
Speech (POS) tagging
Next, all non-alphanumeric tokens are
removed
Set of wordnet synonyms are gathered for
each word, which are used for computing
similarity

Matching Using N-gram Information


To calculate a similarity score for a pair of
system reference translation sentences,
MAXSIM extracts and compares n-gram
information
Based on these comparisons or matches
across the sentence pairs, MaxSim computes
precision and recall

Phases of n-grams
To match n-grams, MAXSIM goes through a
sequence of three phases: lemma and POS
matching, lemma matching, and bipartite
graph matching
We will illustrate the matching process
using unigrams, then describe the extension
to bigrams and trigrams

Lemma and POS-tag matching:An exact


match on n-gram and POS-tag is applied
In all n-gram matching, each n-gram in the
system translation can only match at most
one n-gram in the reference translation
Lemma Matching: For the remaining
unmatched n-grams, a relaxed condition of
just lemma match is used
Bipartite Graph Matching: For the remaining

unmatched unigrams, matches are made by


constructing a weighted complete bipartite
graph
The remaining unigrams form the nodes of
the graph
The weights are the a sum of the wordnet
similarity between two word nodes and the
identity function on whether or not they
have the same POS tag

Calculation of F-score

Scoring a Sentence Pair and the


Whole Corpus
For a sentence pair s, the MaxSim score is
calculated as, where Fs,n is the F score
defined previously for n-gram

For the entire corpus, the sim-score is just


an arithmetic mean over all the individual
sentence pairs score

Evaluation and Results


An alpha of 0.9 is used for these evaluations

References
1. Diana Perez and Enrique Alfonseca,
Application of the Bleu algorithm for
recognising textual entailments
2. Dan Roth, Recognizing Textual Entailment
3. Yee Seng Chan and Hwee Tou Ng, MAXSIM:
An Automatic Metric for Machine
Translation Evaluation Based on Maximum
Similarity

References
4. Masaaki Tsuchida and Kai Ishikawa, IKOMA
at TAC2011: A Method for Recognizing Textual
Entailment using Lexical-level and Sentence
Structure-level features
5. Andrew Hickl and Jeremy Bensley, A
Discourse Commitment-Based Framework for
Recognizing Textual Entailment

You might also like