0% found this document useful (0 votes)
49 views6 pages

Question Answering System On Education Acts Using NLP Techniques

Uploaded by

ASMAMAW
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views6 pages

Question Answering System On Education Acts Using NLP Techniques

Uploaded by

ASMAMAW
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

IEEE Sponsored World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)

Question Answering System on Education


Acts Using NLP Techniques
Sweta P. Lende Dr.M.M. Raghuwanshi
M. Tech Student, Professor,
Department of Computer Technology, Department of Computer Technology,
Yeshwantrao Chavan College of Engineering, Yeshwantrao Chavan College of Engineering
Nagpur, Maharashtra, India, Nagpur, Maharashtra, India,
[email protected] [email protected] .

ABSTRACT— Question Answering (QA) system in information document where the user has to find out the proper answer for
retrieval is a task of automatically answering a correct answer to which he or she is actually looking for. In order to solve this
the questions asked by human in natural language using either a problem at present most of the researchers working in various
pre-structured database or a collection of natural language domain such as Web Mining, NLP, Information retrieval and
documents. It presents only the requested information instead of information extraction and so far focused on open-domain QA
searching full documents like search engine. As information in
day to day life is increasing, so to retrieve the exact fragment of
and close domain QA system. Systems can be divided into two
information even for a simple query requires large and expensive types on the basis of the domains.
resources. This is the paper which describes the different • Closed-domain question answering:
methodology and implementation details of question answering Closed-domain question answering refers to specific domain
system for general language and also proposes the closed domain related questions and can be seen as an easier task because
QA System for handling documents related to education acts NLP systems can provide domain-specific knowledge. It has
sections to retrieve more precise answers using NLP techniques. very high accuracy but limited to single domain. The example
of such system is medicines or automotive maintenance.
Index Term— Question Answering, NLP, Information • Open-domain question answering:
Retrieval, Education Acts Open-domain question-answering deals with the questions
which are related to every domain. In this systems, mainly
I. INTRODUCTION
have more data available from which the system extract the
answer. It can answer any question related to any domain but
Now days there are many search engines available. All with very low accuracy as the domain is not specific.
these search engines have great success and have remarkable
capabilities, but the problem with these search engines is that General Architecture of QA system:
instead of giving a direct, accurate and precise answer to the
user's query or question they usually provide list of document In QA system, User posed a query as a input in natural
related to websites which might contain the answer of that language. After that this query is going to search the document
question. Although the list of documents which are retrieved to extract all the possible answers for the user query.
by the search engine has lot of information about the search The basic architecture of Question-Answering system is as
topic but sometimes it may not contain relevant information shown in Figure
which the user is looking for [11].
The basic idea behind the QA system is that the users just
have to ask the question and the system will retrieve the most
appropriate and correct answer for that question and it will
give to the user. For example, for the following set of
questions such as:
“Which vitamin is present in milk?"
“What is the birth place of Shree Krishna?
Users are more interested in the answers such as
Vitamin A, E, D and K and Mathura rather than a large .
Fig.1 Architecture of Question-Answering System

978-1-4673-9214-3/16/$31.00 © 2016 IEEE


Authorized licensed use limited to: Zhejiang University. Downloaded on November 08,2024 at 11:52:29 UTC from IEEE Xplore. Restrictions apply.
IEEE Sponsored World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)

Query Pre-processing: - This is the first step in QA system in a. Question Analysis:


which input to the system is question ask by user in natural It takes single questions as an input. The aim of this module is
language, the overall function of this module is to process and to identify the question keywords and expected set of answer
analyze the input question. document. It fallows the NLP algorithm for pre-processing.

Query Generation: - In query generation, Query Logic b. Text Retrieval and answer snippet extraction:
Language (QLL) is used for expressing the input question. After identifying query words the answer candidates are
retrieved from the collected document for answer
Database Search: - Here the search of the possible results is identification. The collected document is indexed which has
done in stored database, the related data that satisfy the given total keyword match with the question keyword are selected
query with selected keyword are sent to the next process for answer extraction. For this, it checks the count of match of
the query keyword with each sentence. The sentences which
Related Document: - The result which was generated by the have a some match to the query keyword are selected as the
previous stage is stored as a document. answer candidates are represented with the help of triplet
containing the sentence, index and count of the match. The
Answer Display: - The result is stored as a document. Then the
index is used to extract the actual sentence. The index term is
result is converted into accurate text for that the user is
assigned at the time of text splitting and count of the match
looking for and that answer displayed to the user.
which gives the term that match with the question. These
II. RELATED WORK candidate answers are passed to the next module selection of
answer candidates.
In paper [1], author developed geographical domain question c. Answer identification:
answering system which gives answer to a user question Answer identification has two sub-modules as, Scoring and
related to information about various cities. For designing Ranking and Answer extraction. In case of Scoring and
system, first author create knowledge base document and Ranking of the candidates answer s selection of the winner
perform document pre-processing which involved noise candidate is done using matching window sizes Candidate
Removal, tokenization, sentence splitting and document answer which has the highest score is selected as the winner
tagging by using Named entity Tagger, Parser and Word net candidate and The result of winner candidate is processed for
tool. Question processing, Document processing and answer answer extraction. Answer Extraction is done by using Named
processing are the main element in this system. Question Entity Recognition in that the expected named entity of the
processing deal with sub-classification of question and question is find by analyzing the question word and then
reformulation. Question classification done by plain matching question word nearest to the surrounding word are analyzed to
pattern algorithm. After that passage retrieval was there in identify the expected answer entity.
which pre-processed and indexed corpus was used for passage
retrieval. Retrieval module produced candidate answer which Restricted domain Question Answering System [4] describes
becomes input for answer extraction in which ranking was new architecture which divided into four modules as Question
performed based on semantic relation with the used of Word Processing, Document processing, paragraph extraction and
net tool. After ranking final answer is display with contain Answer Extraction.
maximum rank. a. Question Processing Module:
In Question Processing module the user Question is processed
In paper [2], Author proposed IPedagogy is a question to find some important information from it. Steps from which
answering system which works with natural language powered question Processing Module is process are given below.
queries and retrieve answers from selected information Steps in Question Processing Module:
clusters by reducing the search space of information retrieval. 1) Find out the type of question using Wh word.
In addition, IPedagogy is empowered by several natural 2) Find out the expected type of answer.
language processing techniques which direct the system to 3) Get the Keywords from the Question.
extract the exact answer for a given query. System is evaluated
with the use of mean reciprocal rank and it is noted that b. Document Processing Module:
system has 0.73 of average accuracy level for 10 sets of In document processing module, the documents which are
questions where each set is consisted of 35 questions. relevant to the given question are retrieved and processed.
Steps in Document Processing Module:
Author in paper [3] developed new architecture of Question
1) After processing the question, search related documents for
answering system in Malayalam, which finds answers of
that question using a reliable search engine.
Malayalam questions by analyzing documents in Malayalam
2) Take top ten relevant documents.
language. It handles the four classes of Malayalam question
3) Extract the relevant content from these documents
for closed domain. System classified into 3 modules as
4) Save these contents in a text file.
Question Analysis, Text Retrieval and answers snippet
extraction and Answer identification.

Authorized licensed use limited to: Zhejiang University. Downloaded on November 08,2024 at 11:52:29 UTC from IEEE Xplore. Restrictions apply.
IEEE Sponsored World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)

c. Paragraph Extraction Module:


This is the module where the task of Paragraph Extraction and
Sentence Extraction takes place in order to find out the most
probable Answer of the question in hand.
Steps in Paragraph Extraction Module:
1) Run Paragraph extractor over the text file, obtained from
the Document Retrieval module.
2) If question is Definition or Factoid Type, then send the
extracted paragraph to the next sub module.
3) If question is Definition or Factoid Type, then send the
extracted paragraph to the next sub module.
The output of the Document Processing Module that is the
Text file and the output of the Question Processing Module
which stored in the Information Repository after processing
the Question are passed to this second last module that is the
Paragraph Extraction Module. Here the Paragraph Extractor
will extract paragraphs from this Text file with the help of the
information obtained from Information Repository. Paragraph
Extractor will extract only those paragraphs which have the
same keywords as those of the Question Keyword. The output
of Paragraph Extractor will be those texts which contain at
least a keyword of the question.
Fig.5 Proposed Architecture of QA

d. Answer Extraction Module: An input for the system will be a query related to education
This module presents algorithms for extracting the potential acts or different information related to education. For example
answer for all the three categories of questions that is “what is the duty of parent to secure the children education?”,
Definition Type of Question, Descriptive Type of Question “What are the funding authorities of school?”The Question
and Factoid Type of Question. Using Standford parser toolkit keyword is calculated by removing stop words and performing
Author find out the grammatical structure of question and on stemming on question to extract the answer. The dataset of
basis of structure find out the potential answer from dataset education act related document is generated to form indexed
means those sentence which have the same head words as that term dictionary as metadata knowledge base storing the
of question have. related keywords of each document. Using these keywords,
the original passage or sentences are tagged to give candidate
In paper [5], author proposed a Chinese question
answering system which uses real-time network information answers from answer extractor. According to given question,
the constraint and candidate answer matched against each
retrieved by search engine. This system consist of three
other and highest score probable answer is retrieved as a final
module, question analysis, information retrieval and answer
extraction and also author used lots of NLP technologies such answer. The system will produce the accurate answer for
trained questions and then will test to measure the accuracy of
as question classification, syntactic dependency parsing and
untrained questions.
named entity recognition. After experiment many correct
answers were not the best answer with the highest scores. IV. DETAILS OF IMPLEMENTATION
Main reason is the lack of enough evidence for correct answer METHODS:
score so the answer scoring module is the main work in future.
1. Collection and Study of relevant data set:
III. PROPOSED APPROACH
The ¿rst design module of the proposed work includes the
With help of literature survey of Question Answering Collection and Study of relevant data. For the said work the
Systems, we can conclude that the closed domain QA system relevant data set is the records of education acts related data.
gives accurate answer than the open domain QA system. If we The required data set is collected and studied from different
see the current scenario, there is no QA system for exactly websites related to education acts. Based on observation, next
answering the queries on document of education act related module of system is to create corpus.
information, which ensures the correct answers. So, the idea
for developing the Question Answering System on education A. Creation of Dataset (Corpus):
act is proposed as shown in figure 4. The second phase of the Question Answering System is to
create Corpus. As we have to design the closed domain QA

Authorized licensed use limited to: Zhejiang University. Downloaded on November 08,2024 at 11:52:29 UTC from IEEE Xplore. Restrictions apply.
IEEE Sponsored World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)

System, the most important work is to decide the domain. For each term t, we store a list of all documents that contain
There are so many QA systems already present for diīerent t.
closed domains. So we are dealing with the new domain for
answering the user queries on education acts and information
about education. The education acts related documents of each
education sections are necessary to know in diīerent ways for
diīerent users. On diīerent websites, Information about
education related documents are available for interested users.
But for finding exact answers on different questions related
these documents can be given by using only QA system. So
we have gone through the diơerent websites and taken the text
data from the websites: www.legislation.gov.uk. For each
section of education acts, there is one text ¿le; such for 583
sections, 583 text ¿les are stored as corpus. Education acts
related text ¿le contain information about each section of
education such as information about school, funding Figure 5.1: Structure of index term dictionary
authorities of school, areas of school, duties of teacher related
to student, duties of parents to secure children education and 3. Question Preprocessing:
many more information related to education. Total 583
sections are available related to education acts. The given input query is preprocessing by performing some
preprocessing operation on it i.e. POS tagging, stop words
B. Preprocessing: removal and stemming.
After creating corpus some preprocessing operations are
performed on each text ¿le of corpus. Major tasks in pre- a. User Query:
processing are stop words removal and stemming. User will enter the query related to education system. For
a. Stop words Removal: example, the user can ask the question “what is the duty of
Removing stop words reduce the dimensionality of term parent to secure children education?” or what primary school?
space. The most commonly finding words are in text Or any query related to education system
documents are prepositions, articles, and pro- nouns etc that
does not provide the meaning of the documents. Stop words b. POS tagging:
are eliminated from documents because those words are not First we perform POS (part of speech) tagging operation on
considered as keywords in Information Retrieval applications. input query to tag each word of user query with its type such
E.g. the English stopword like “is, for, the, in, etc” are remove as verb, noun etc. For tagging each word POS standford tagger
from each text ¿les of dataset by maintaining English is used.
stopword dictionary.
c. Extracted Keyword:
b. Stemming: From the user query, the keywords are extracted. These
Stemming is most important process by which the diīerent keywords are got by removing the symbols and stopwords
forms of word is replaced with basic root word. e.g. from user query, also stemming is applied on keywords so as
automate(s), automatics, automation all reduced to automatic match with index term dictionary term for document retrieval.
For stemming, English stemwords dictionary (for example a English stop words and stemmed words dictionary is
file containing set of document that contain stem words) is maintained to extract keywords.
maintained for extracting keywords. For example:
¾ In case of POS Tagging:
2. Index Term Dictionary Input- What is the duty of parents to secure education of
children?
After pre-processing the extracted keyword are stored in index Output- What/WP is/VBZ the/DT duty/NN of/IN parent/NN
term dictionary. Extracted keywords contain only steam words to/TO secure/VBZ education /PRP Children/NNS
which obtain after performing stemming. Index term
dictionary is created by using java and stored as table in ¾ After Removing Stop-word from question:
mysql. Dictionary forms a structure like invert index Input- What is the duty of parents to secure education of
containing two columns as term and posting. Term is nothing children?
but a extracted keyword and posting is name of ¿le. The Output- duty parents secure education children
following ¿gure shows the structure of index term dictionary.
¾ After Stemming:
Output-keyword [duty parent secure education children]

Authorized licensed use limited to: Zhejiang University. Downloaded on November 08,2024 at 11:52:29 UTC from IEEE Xplore. Restrictions apply.
IEEE Sponsored World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)

4. Document Retrieval:

In document retrieval system the extracted keywords which


are obtained by query are match with term of indexed
dictionary. After matching only match keyword’s ids are
retrieved as a document. For more than one keywords, it takes
the intersections of all document ids where these terms are
present so that where all the keywords are find only those
document are to be retrieved for candidate answer passages.
For example The term duty is present in file 1, 2,3 5, 8 & the
term parent is present in 1,2,3, 4, 6, and 8. It retrieves all the
files as file 1,2,38 the keywords that obtained from user query
are matched in documents. It can give number of document
which matched keywords

5. Keyword Ranking with document:

In case of keyword ranking, first we find out score in between


query keywords and all files which is obtained by document
retrieval. For finding score we use jaccard similarity fuction.
In case of jaccard similarity, we first find out the intersection
in between extracted keywords of query and all files retrieved
after document retrieval.

Score = (A n B) / (A u B)
Fig. File name-section11.txt
Where, A = set of extracted keywords.
6. Answer Extraction:
B= set of files keywords
In case of answer extraction, POS tagging is apply on all filter
The of Keywords contain one or two files that contain document which is obtain in keywords ranking. After applying
accepted answer. POS tagging, we check the sense in between extracted which
E.g. Input Query- What is the duty of parent to secure children is obtain by query and filter document.
education? e.g. Suppose extracted keywords from query are
[Duty, parent, education, children]
Output: Extracted document after jaccard We check where,
[Duty, parent] used as - noun in document
[Education] used as- pronoun in document
[Children] used as- NNS
After checking sense in between query and document. We
extract that paragraph which contains same sense as the query
sense.

7. Answer:

After answer extraction we select answer which we obtain in


answer extraction and we present this answer to user for that
the user is looking for.

Fig. File name -Section7.txt

Authorized licensed use limited to: Zhejiang University. Downloaded on November 08,2024 at 11:52:29 UTC from IEEE Xplore. Restrictions apply.
IEEE Sponsored World Conference on Futuristic Trends in Research and Innovation for Social Welfare (WCFTR’16)

[7]. Abdullah M. Moussa and Rehab F. Abdel-Kader, QASYO: “A


Question Answering System for YAGO Ontology”, International
Journal of Database Theory and Application Vol. 4, No. 2, June,
2011.
[8]. Moussa, Abdullah M. & Rehab, Abdel-Kader (2011) “QASYO:
A Question Answering System for YAGO Ontology”. International
Journal of Database Theory and Application. Vol. 4, No. 2, June,
2011. 99.
[9]. Zeng-Jian Liu, Xiao-Long Wang and Qing-Cai Chen “A
Question answering system on web search” International conference
on machine learning 2014.
[10]. Lahiru Samarakoon, Sisil Kumarawadu “Automated Question
Answering for customer Helpdesk Application” 2011 6th Internation
conference on Industrial and Information System.
[11]. Tiansi Dong and Ulrich Furbach “A natural language QA
system as a Participant in Human Q and A portal”.
This is the final answer given to the [12]. Anette Frank , Hans-Ulrich Krieger, Feiyu Xu, Hans Uszkoreit,
Berthold Crysmann, Brigitte Jörg and Ulrich Schäfer, “Question
answering from structured knowledge .
[13]. sources”, In German Research Center for Artificial Intelligence,
V. CONCLUSION DFKI, Stuhlsatzenhausweg 3, 66123 Saarbrücken, Germany
Available online 27 January 2006.
The objective of this paper is to review some of the [14]. Pum-Mo Ryu, Myung-Gil Jang and Hyun-Ki Kim. 2014. “Open
methods and implementation techniques which are used for domain question answering using Wikipedia-based knowledge
implementing Question Answering System. On the basis of model.” In Information Processing and Management 50 (2014) 683–
literal survey we can conclude that, Question answering 692, Elsevier.
system using NLP techniques is more complex compared to [15].Adel Tahri and Okba Tibermacine. “DBPEDIA BASED
FACTOID QUESTION ANSWERING SYSTEM.” In International
other type of Information Retrieval system. QA Systems can
Journal of Web & Semantic Technology (IJWesT) Vol.4, No.3, July
be developed for resources like web, semi-structured and 2013.
structured knowledge- base domain. The Closed domain QA [16]. Menaka S and Radha N. “Text Classi¿cation using Keyword
Systems give more accurate answer than that of open domain Extraction Tech- nique”, in International Journal of Advanced
QA system but this system is restricted to single domain only. Research in Computer Science and Software Engineering, Volume 3,
The QA system for closed domain of documents of related to Issue 12, December 2013.
education acts using NLP techniques and information retrieval [17] ZHANG Yu, LIU Ting, WEN Xu, ”Modi¿ed Bayesian Model
are proposed to give the accurate and suitably more correct Based Question Clas- si¿catio”, , vol.19, pp. 100-105.
answers for user’ queries. [18] Pum-Mo Ryu, Myung-Gil Jang and Hyun-Ki Kim. 2014. “Open
domain question answering using Wikipedia-based knowledge
References: model.” In Information Processing and Management 50 (2014) 683–
692, Elsevier.
[1]. Amit Mishra, Nidhi Mishra and Anupam Agrawal, “Context-
Aware Restricted Geographical Domain Question Answering
System”, In 2010 International Conference on Computational
Intelligence and Communication Networks.
[2] Rivindu Perera “IPedagogy: Question answering system based
on web information clustering” 2012 IEEE Fourth International
Conference on Technology for Education.
[3]. Pragisha K. and Dr. P. C. Reghuraj, “A Natural Language
Question Answering System in Malayalam Using Domain Dependent
Document Collection as Repository.” International Journal of
Computational Linguistics and Natural Language Processing Vol 3
Issue 3 March 2014 ISSN 2279 – 0756.
[4]. Payal Biwas, Aditi Sharam,Nidhi Malik” A Framework For
Restricted Domain Question Answering Sysem”2014 International
conference on issue and challenges in intelligent computing
Technique.
[5]. Zeng-Jian Liu, Xiao-Long” A Chinese Question Answering
System Based On Web Search”2014 the international conference on
machine learning.
[6]. Jibin Fu, Keliang Jia and Jinzhong Xu, “Domain Ontology Based
Automatic Question Answering”, 2009 International Conference on
Computer Engineering and Technology.

Authorized licensed use limited to: Zhejiang University. Downloaded on November 08,2024 at 11:52:29 UTC from IEEE Xplore. Restrictions apply.

You might also like