Automatic Question Paper Generation, According To Bloom's Taxonomy, by Generating Questions From Text Using Natural Language Processing
Automatic Question Paper Generation, According To Bloom's Taxonomy, by Generating Questions From Text Using Natural Language Processing
ISSN No:-2456-2165
Abstract:- The ongoing research on "Natural Language questions from the large question bank eliminates any
Processing and its applications in the educational possibility of human bias and thus, making every test paper
domain”, has witnessed various approaches for question unpredictable. Thus, the system proves to be beneficial for
generation from paragraphs. Despite the existence of the online school examinations, especially during times of the
numerous techniques for the automatic generation of pandemic, for the creation of new questions whose answers
questions, only a few have been implemented in real are not directly available on the internet; thus, reducing
classroom settings. This research paper reviews existing student malpractices. These questions generated can be used
methods and presents an AQGS (Automatic Question by teachers to set test papers. Students can leverage it for self-
Generation System) that uses Natural Language evaluation to understand their grasp on a particular topic.
Processing Libraries like NLTK and Spacy to suggest This automation reduces costs, labor, and rules out the
questions from a passage provided as an input. The occurrence of human errors, arming the user with a fast and
Question Paper is generated by randomly selecting easy-to-use question-generating tool at their fingertips.
questions for a specific level of Bloom’s Taxonomy. We
conclude by determining the efficacy of the AQGS using Generally, the three major components of Question
performance measures like accuracy, precision, and Generation are input pre-processing, sentence selection, and
recall. question formation. The input text is filtered by removing
unnecessary words and punctuations that do not contribute to
Keywords:- Question Generation, Bloom’s Taxonomy, the meaning of the sentence. The sentences or phrases from
Natural Language Processing (NLP), Natural Language which questions can be formed are segregated from the
Toolkit (NLTK), Spacy, POS Tagging, Named Entity remaining text. These are mapped to the type of question
Recognizer (NER). (what, where, when, etc.) that can be formulated with the
selected sentence, followed by the final step of framing a
I. INTRODUCTION grammatically sound question.
techniques for implementation of each phase along with subject, object, and prepositions that a sentence comprises of.
recent trends and challenges for MCQ generation is presented POS Tagger is used to label the part of speech for each word,
in the paper. Dependency parser analyses the grammatical structure of the
sentence and the relation between the words, and Support
Automatic Cloze question generation or CQG [2], an Vector Machine (SVM) is the algorithm used for performing
article in English is provided as an input from which the classification. Human evaluation is done to check the
system generates a list of cloze questions- a sentence semantic and syntactic accuracy of the output generated.
comprising of one or multiple blanks. Sentence Selection,
Keyword selection (potential blank selection), and Distractor Similarities in words Using Different POS Taggers [7],
selection (selecting alternate answers for the blank) are major presents a comparison of four different POS Taggers (NLTK,
components of CQG. To begin with, potential sentences are Freeling, NLP Tagger and Cognitive POS Tagger) to identify
selected followed by keyword selection on the basis of NER the proper tag for a given text. The paper analyses the results
and finally, domain-specific distractors are generated based of each tagger for Wh-questions like how, what, which,
on the knowledge base provided to the model. Manual where, who and why. Out of 350 wh-questions, 154 had
evaluation of the system is done for each sentence, keyword contrasting tags by these four tools and the results can are
and distractor selection. summarized by stating that NLTK outperforms other taggers
by labeling the word with the right part of speech. We use the
Automatic Question Generation using Discourse Cues NLTK tagger for POS tagging and other NLTK algorithms
[3], the system can be viewed as content selection and like Lancaster Stemmer and WordNet Lemmatizer that are
question formation. The emphasis is on recognition of discussed in the following sections.
discourse markers and discerning the important discourse
relations like casual, temporal, contrast, result, etc. After III. SPACY
identification of relevant text for framing a question, (seven)
discourse connectives are specified for finding the type of Spacy is one of the on-the-go-libraries of NLP
Wh-question (like why, where, which and when) and syntax enthusiasts which is specifically built to process and help us
transformations are performed. Semantic and syntactic understand large volumes of text. The Spacy framework
evaluation of the system is done. which is written in Cython is a quite fast library that supports
multiple languages like English, Spanish, French, German,
Semantic Based Automatic Question Generation using Dutch, Italian, Greek, etc. It comprises various models about
Artificial Immunity [4], both, SRL (Semantic Role Labelling) trained vectors, vocabularies, syntaxes, and entities. These
and NER (Named Entity Recognizer) are for the conversion models are to be loaded based on the requirements. For the
of input text into a semantic pattern. An Artificial immune “english-core-web” package, the default package is
system that uses feature extraction, learning, storage and 'en_core_web_sm' where ‘sm’ stands for small. Spacy has
associative retrieval to classify patterns according to question three models in the English language- small, medium, and
type like who, when, where, why, and how. The input large. As the name suggests, these models vary in size and
sentence is mapped into a pattern through SRL (which is used accuracy. However, in the proposed system load the large
for feature extraction) and NER, and depending on the package which is used for entity recognition for better
question type, sentence pattern is realized. 170 sentences accuracy and precision.
were mapped into 250 patterns that were used for training and
testing. For evaluation, Recall, Precision and F-measurement >>>import spacy
were used. The proposed model has a classification accuracy >>>spacy.load (“en_core_web_lg”)
of over 95%, and 87% in generating the new question
patterns. IV. NATURAL LANGUAGE TOOLKIT
A Combined Approach Using Semantic Role Labelling NLTK- the Natural Language Processing Toolkit, is the
and Word Sense Disambiguation for Question Generation and mother of all NLP libraries. It provides lexical resources, over
Answer Extraction [5], the article introduces a joint model of 50 corpora and a set of libraries for tokenization, stemming,
question formation and answer identification using Natural classification, tagging, and semantic reasoning along with
Language Processing. The Question Generation part makes many others. It is a platform to develop programs that require
use of SRL and WSD (Word Sense Disambiguation) natural language processing in Python language. NLTK is a
techniques while the Answer Extraction part uses NER along crucial component of the AQGS system presented in this
with SRL. Simple sentences are provided as an input to the paper.
model. The questions and answer pairs obtained for a set of
sentences were analyzed to evaluate the accuracy of each A. Lancaster Stemmer
question generation and answer extraction. Stemming in Natural Language Processing refers to the
process of reducing words to their stem or root word. This
Automatic Question Generation from Given Paragraph word stem may not be the same word as a dictionary-based
[6], the paper present a web application in which simple and root word. But, it is just a smaller or equal form of the word.
complex Wh-questions are generated from a paragraph. It is For instance, ‘retrieves’, ‘retrieval’, ‘retrieved’ reduce to the
mapped to a set of predefined rules depending on the verb, root ‘retrieve’. Porter’s Stemmer, Lovins Stemmer, Dawson
TABLE 2. BLOOM’S TAXONOMY (FROM LOWER TO HIGHER ORDER THINKING SKILLS), QUESTION CUES AND QUESTION STEMS
Category Question Cues Question Stems
Knowledge define, who, when, where, quote, Who wrote...?, when did…?, who said…?,
(Factual recall, remembrance of major name, identify, label where did…?, who are the…?
dates, events, etc.)
Comprehension differentiate, distinguish, describe, What is the difference between…?, What is the
(understanding, compare, interpret) summarize, discuss, predict, list, summary of…, what is the predicted outcome
contrast of…, what is the sequence of…
Application Demonstrate, calculate, solve, How to solve…, what is the classification
(visualize application in real life, solve illustrate, examine, test, classify of…?, how to examine…?, Demonstrate the
problems using methods or theories) process of….
Analysis Analyse, explain, classify, connect, What proves that…?, how is this
(identification, pattern, recognition, infer, probe similar/different to…?, what is the problem
analysis) with…?, why did …precede/follow…?
Evaluation Access, rank, grade, support, How effective is…?, what would you
(choose, verify evidence, recognize, conclude, select, measure, convince, choose…? How would you rank/grade…?,
access theories) support what does the argument support…?
Creativity Design, innovate, hypothesise, Can you image how…?, how would you
(independent creative thinking, shift conceive, craft, compose, invent invent…?, hwo would you respond, what
perspective, innovate) design would you make for…?
>>> [('He', 'PRP'), ('went', 'VBD'), ('bankrupt', 'RB'), questions pertaining to the “Analysis” category are chosen to
('because', 'IN'), ('he', 'PRP'), ('took', 'VBD'), ('too', 'RB'), generate the question paper. The random module of python is
('many', 'JJ'), ('loans', 'NNS'), ('.', '.')] used for the random selection of questions stored in the
QnA Pair Generated: database. The Mersenne Twister PRNG algorithm is used by
>>> [['He went bankrupt because he took too many loans.', the ‘rand’ function and has a period of 2**19937-1, ensuring
'Why did he went bankrupt ?']] a purely random and unbiased question paper. When the pdf
QnA pair with Bloom’s Taxonomy level identified for of the question paper is generated, appropriate answers for the
Question Stem (Why did…) according to Table 2: selected question are simultaneously inserted in a separate
>>> [['He went bankrupt because he took too many loans.', pdf file.
'Why did he went bankrupt ?', 'Analysis']]
VII. RESULTS
Finally, the question and answer pairs generated are
stored in a database with details about the course, module We evaluate the performance of the system by
along with which level of Bloom’s Taxonomy a particular providing paragraphs with a varied number of sentences as in
question belongs to. Additionally, the spreadsheet or existing input. The questions generated were checked and compared
question bank of a particular professor for a particular test can with questions generated with human English proficiency.
be integrated into the database as the more the number of Following this iterative process 10 times, the attributes of a
questions, the less is the chance of question repetition. confusion matrix (TP, TN, FP, and FN) were calculated.
Further, using these values, performance parameters-
B. Question Paper Generation Using Bloom’s Taxonomy accuracy, precision, and recall are calculated and represented
The system is provided with a predefined list of graphically. Accuracy can be defined as the proximity of
question cues and question stems for each category of given values to the true value. By analyzing the results we
Bloom’s Taxonomy. Using POS tags of the tokens in the can say that the system works with an accuracy of 72.9%.
question, the accurate level of Bloom’s Taxonomy is Precision shows us the closeness of different measured values
identified for the question. The examiner has to specify the to each other and Recall depicts the fraction of relevant
number of questions for a specific category on the UI of the instances to the total number of values retrieved. Figure 3
system developed such as “Knowledge”-based 5 questions, depicts the graphical representation of the performance
“Application”-based 3 questions and “Analysis”-based 2 measures calculated (where the X-axis denotes the number of
questions. Out of the question repository, 5 random questions sentences and the Y-axis represents accuracy, precision,
pertaining to the “Knowledge” category, 3 random questions recall respectively).
pertaining to the “Application” category and 2 random
VIII. CONCLUSION
semantic analysis using NER. Grammatically sound questions [8]. Anderson LW, Krathwohl DR. A taxonomy for
are formed using NER and syntax tree and are stored in the learning, teaching, and assessing: a revision of Bloom’s
database after mapping each to an appropriate Bloom’s taxonomy of educational objectives. New YorkNY:
taxonomy. The test paper is generated by random selection of Longmans; 2001
questions for a specific category of the taxonomy used.
Future work of the system includes increasing the accuracy of
the system by enhancing question framing. Questions other
than wh-questions (like true/false, MCQs, etc.) can be
incorporated. An Answer evaluation module can be integrated
to evaluate and score the test answers submitted by students
by calculating semantic similarity with the correct answer.
Out of the numerous papers on approaches for question
generation, this paper focuses on the implementation of
AQGS system in python to contribute to automated, quick,
unbiased question paper generation.
ACKNOWLEDGMENT
REFERENCES