A Survey of Text Question Answering Techniques
A Survey of Text Question Answering Techniques
net/publication/258651905
CITATIONS READS
104 3,915
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Poonam Gupta on 30 January 2016.
1
International Journal of Computer Applications (0975 – 8887)
Volume 53– No.4, September 2012
Answer extraction is a final component in question answering deals with and so on. The table 1 provides the detail of the
system, which is the tag of discrimination[5]. comparisons of these QA systems.
2. GENERAL ARCHITECTURE Table 1, Characterization of QA systems
The user writes a questioni by means of the user query
interface. After that this query is used to extract all the QA system based on vQA systems
possible answers for the input question. The architecture of DIMENSIONS NLP and IR Reasoning
Question-Answering system is as shown in Figure 1. with NLP
The architecture which is given in Figure 1 works in 5 stages.
The function of each stage is as follows [6]: Technique Syntax processing, Semantic
Named Entity tagging Analysis or
and verify com and Information high reasoning
Retrieval
Data Resource Free text documents Knowledge Base
Domain Domain Independent Domain Oriented
Responses Extracted Snippets Synthesized
Responses
Questions Mostly wh- type of Beyond of wh-
Deals with Questions type of questions
Evaluations Uses existing N/ A
Information Retrieval
3.1 Web Based Question Answering
Systemiser
With the wide spread usage of internet a tremendous use of
data is available, web is one of the best source to obtain the
aFigure 1. Architecture of Question-Answering Systemp d information. Web based question answering systems is using
the search engines (Like Google, Yahoo, Alto Vista etc.,) to
2.1 Query Pre-processing get back webpage’s that potentially containing answers to the
Given a natural language question as input, the overall questions. The majority of these Web based QA systems
function of the question preprocessing module is to process works for open domain while some of them works for domain
and analyze the input question. This leads to the classification oriented also. The wealth of information on the web making it
of question as belonging to any of the types supported by the an attractive store for getting quick answers to simple, factual
system. questions[16]. The data that is available on web has the
characteristics of semi structure, heterogeneity and
distributivity.
2.2 Query Generation The Web Based QA systems mostly handles wh-type of
In query generation we will use Query Logic Language (QLL)
questions such as “who killed Indira Gandhi”?
which is used to express the input question.
Or “Which of the following is correct”. This QA system
2.3 Database Search provides answers in various forms like text documents, Xml
Here the search of the possible results is done in the stored documents or Wikipedia. The common levels that are used by
database, the related results that satisfy the given query with different web based Question Answering systems
selected keyword and rules are sent to the next stage. architectures are as follows [10]:
2.4 Related Document Question Classification: This level gives correct answers
The result which was generated by the previous stage is stored by classify the user query into one of the question type to
as a document. which it belongs to. The question classification is made to
provide better accuracy in the results.
2.5 Answer Display
The result is stored as a document which is in wx format . Answer Extraction : This level extracts the correct
Then the result is converted into required text which is possible answers for different classification of questions.
required by the user and displayed to the user.
Answer Selection: Among the possible answers obtained,
ranking approaches are used to find out the best accurate
3. TYPES OF QA SYSTEMS answers based on its weightage factor.
Different types of QA systems which are divided into two Answer classes generally is of factoid and non - factoid types.
major groups based on the methods used by them. First group The factoid is getting short fact based answers like names,
of QA system belongs to simple natural language processing dates, and non-factoid is getting descriptions or
and information retrieval methods, while another group of QA definitions[27].
systems are dependent upon the reasoning with natural Given a user's natural language question, the system will
language. submit the question to a search engine, then extract all
The two QA systems are compared with characteristics of possible answers from the search results according to the
different dimension such as techniques used, question that question type identified by the question classification module,
finally select the most similar answers to return. The
2
International Journal of Computer Applications (0975 – 8887)
Volume 53– No.4, September 2012
architecture of web based question answering system is shown Question processor which is taking the question as input
in figure 2[18]. and generates asking point for the question which in turn
helps to match for the answer in the text.
3
International Journal of Computer Applications (0975 – 8887)
Volume 53– No.4, September 2012
specific database or ontology query is constructed. From the normal context. The perspectives of these types of questions
result(s) returned by the queried information source, an may fluctuate but the common goal is to obtain accurate
answer object is generated which forms the basis for answer from the system. This section presents a classification
subsequent natural language answer generation. This is shown of different levels of Questioners.
in figure 4 [15].
CASUAL QUESTIONERS: In this type of questioners
normal questions are pose to the system. Majorly it focus in
normal “perspective” to handle the questions like Eg:“ When
he was born?” and “who invented telephone?” . All these
type of questions are having normal context.
4
International Journal of Computer Applications (0975 – 8887)
Volume 53– No.4, September 2012
4.1 Filtering candidate document QA system take a natural language question as input, convert
The idea of paragraph filtering is based on the principle that the question into a query and forwards it to an IR module.
the most relevant documents should contain the question When a set of appropriate documents is retrieved, the QA
keywords in a few neighboring paragraphs, rather than system extracts an answer for this question. There are
dispersed over all documents. To exploit that idea, the different methods of identifying answers. One of them make
position of the set of question keywords in each document use of a predefined set of entity classes. Given a selected
was examined. If the keywords are all found in some set of N question, the QA system classifies it into those classes based
successive paragraphs, then that set of paragraphs will be on the type of entity it is looking for, identifies entity
returned, otherwise, the document is rejected from further instances in the documents, and selects the most expected one
processing. ‘N’ is again a configurable number that could be from all the entities with the same class as the question. There
tuned based on an evaluation of system performance under are different types of methods available for classify the
changed tolerances of keyword distance in documents. question. In the following section we are going to discuss
important technique for question classification. Such as
identification of question pattern, semantic approach for
4.2 Identifying quality of the document question classification, sub tree kernel using support vector
To estimate the quality of the selected paragraph quality machine to improve the performance of the question
component has used. If the quality of paragraphs is deemed to classification.
be inadequate, then the system returns to the question
keyword extraction module, and alters the heuristics for oFunctional Word Questions: All Non-Wh questions
extracting keywords from the question. Then the IR can
(except how) fall under the category of Functional Word
performed by using new set of key word retrieved from
Questions. These questions generally start with non-
scratch. The reason of re-determining question keywords
significant verb phrases.
stems from including either too many or too few candidate
Example: Name the Ranger who was always after Yogi Bear.
paragraphs after paragraph filtering. In either case, new
queries for the information retrieval system are produced by
When Questions: When Questions starts with ‘‘When”
revisiting the question keywords component, and either
keyword and are temporal in nature. The general pattern for
adding or dropping keywords. This feedback loop offers some
When Questions is When (do|does|did|AUX) NP VP X”,
form of retrieval context that ensures that only a ‘reasonable’
where AUX, NP, and VP auxiliary verbs, noun phrases, and
number of paragraphs are passed onto the Answer Processing
Verb phrases. ‘|’ indicates Boolean OR operation and ‘X’ can
module. Like several other parameters, exactly how many
be any combination of words playing insignificant role in
paragraphs constitute a ‘reasonable’ number should be
answer type determination.
configured, based on performance testing. Next paragraph
Example: When did Israel begin turning the Gaza Strip and
ordering is to rank the paragraphs according to a plausibility
Jericho over to the PLO?
degree of containing the correct answer.
Where Questions: ‘‘Where Questions” starts with Where
4.3 Standard radix sort algorithm for keyword and are related to the location. These may represent
natural entities such as mountains, geographical boundaries,
paragraph ordering manmade locations such as temple, or some virtual location
This algorithm uses different scores to order the paragraph.
such as Internet or fictional place. The general pattern for
The number of words from the question that are recognized in
Where Questions is Where (do|does|did| AUX) NP VP X?”
the identical sequence within the recent paragraph window,
Example: Where is Italy?
the number of words that separate the majority of distant
keywords in the current paragraph window and the number of
unmatched keywords in the recent paragraph window. Which Questions: The general pattern for Which Questions
Paragraph window is defined as the smallest span of text is Which NP X”? The expected answer type of such questions
required to capture each maximally inclusive set of question is decided by the entity type of the NP.
keywords within each paragraph. Radix sorting is performed Example: Which company manufactures sports kit?
for each paragraph window among all the paragraphs. s using
special purpose data languages. Most important, Who/Whose/Whom Questions: Questions falling under this
category have general pattern(Who|Whose|Whom)
4.4 Lexical and Syntactic Knowledge for IR [do|does|did|AUX] [VP] [NP] X? Here [word] indicates the
In our suggestion we adopt the format of parsing the query to optional presence of the term word in the pattern. These
acquire the set of query terms to calculate the TP information, questions usually ask about an individual or an organization.
instead of calculating TP among all possible combinations of Example: Who wrote ‘Hamlet’?
query pairs, but we vary from previous approaches in the
following three points: first we do not carry out a full parsing Why Questions: Why Questions always ask for certain
of the query but chunking the queries into sets of simple reasons or explanations. The general pattern for Why
phrases such as noun, prepositional phrases and sequences of Questions”\ is ‘‘Why [do|does|did|AUX] NP [VP] [NP]” X”.
verbs .In order to reach a more consistent behavior for Example: Why do heavier objects travel downhill rapidly?
different queries, we apply different TP measures depending
on the lexical type of each query term. We apply TP measures How Question: ‘‘How Questions” have two types patterns of
to phrases as well as terms because phrases represent the syntax: ‘‘How [do/does/did/AUX] NP VP X?” or ‘‘How
concepts expressed in a text more accurately than single [big|fast|long|many|much|far] X?” For the first pattern, the
words. answer type is the explanation of some process while second
pattern return some number as a result.
4.5 Question Classification Example: How did the jack gets its name?
Question answering is an alternate of information retrieval,
which retrieves detailed information rather than documents. A
5
International Journal of Computer Applications (0975 – 8887)
Volume 53– No.4, September 2012
What Questions: What Questions have several types of the process of becoming Bangla), and a fairly large number of
patterns? The most general regular expression for What “loan-words” from Persian, Arabic, Portuguese, English and
Questions can be written as ‘‘What [NP] [do/does/did/AUX] other languages. Also a large number of words are considered
[functional-words] [NP] [VP] X? What Questions can ask for to be of unknown etymology. A translation based on
virtually anything. transliteration and a table look-up method is proposed as an
Example: What is considered the costliest disaster for interface to the actual QA task. The implementation part thus
insurance industry? Many What Questions are disguised in the involves transliterating a Bangla question as an equivalent
form of ‘‘Functional Word Questions”. Latin alphabet (English) version that could be used in an
actual QA task. The Bangla lexicon consists of a good number
of “loan-words” from Arabic, Persian, English and other
5. MULTI-STREAM languages. And most of them are pronounced almost the same
QUESTIONANSWERING way as would be pronounced in the original language. Entire
The selection of the final answer is complicated by the fact work can be divided into two components, the translation
that the final answer has to be selected from various pools of based on transliteration with table look-up and the question
ranked candidates found by different streams[25]. In other answering part[16].
words, the selection of the correct answer from a given set of
replies corresponding to different QA systems. In particular, it
propose a supervised multi-stream approach that decides
about the correctness of answers based upon a set of features
that describe: (i) the compatibility between question and
answer types, (ii) the redundancy of answers across streams,
as well as (iii) the overlap and non-overlap information
between the question–answer pair and the support text[14].
6
International Journal of Computer Applications (0975 – 8887)
Volume 53– No.4, September 2012
7
International Journal of Computer Applications (0975 – 8887)
Volume 53– No.4, September 2012
[21] Zhenqiu, Liang. “Design of Automatic Question [25] Jijkoun, Valentin. and Rijke, Maarten de. “Answer
Answering System Base on CBR”. Journal of Procedia Selection in a Multi-Stream Open Domain Question
Engineering 29, 2011. 981-985. DOI Answering System”.
:10.1016/j.proeng.2012.01.075. Elsevier.
[26] Kwok, Cody.,Etzioni, Oren. and S. Weld, Daniel.
[22] Badia, Antonio. “Question answering and database “Scaling Question Answering to the Web”. ACM
querying: Bridging the gap with generalized Transactions on Information Systems, Vol. 19, No. 3,
quantification”. Journal of Applied Logic 5,2007. 3-19. 2001, 242–262.
DOI:10.1016/j.jal.2005.12.007. Elsevier.
[27] Quarteroni, S. and Manandhar S. “Designing an
[23] Gupta, Vishal. and Lehal, Gurpreet S. “A Survey of Text Interactive Open-Domain Question Answering System”.
Mining Techniques and Applications”. Journal of Journal of Natural Language Engineering 1. 1-23.
Emerging Technologies in web Intelligence, VOL. 1, No.
1. [28] Molla ,Diego. and Vicedo, Jose Luis. “Question
Answering in Restricted Domains: An Overview”.
[24] “Introduction to the special issue on question Association for Computer Linguistics. 41-61.
answering”. Editorial of Information Processing and
Management 47,2011. 805-807. DOI:
10.1016/j.ipm.2011.04.004. Elsevier.