0% found this document useful (0 votes)

214 views9 pages

NLP Based Automatic Answer Script Evaluation

This document summarizes a research paper about developing a natural language processing (NLP) based system for automatically evaluating answer scripts. The system extracts text from answer scripts, summarizes the text, calculates similarity measures between the summary and stored correct answers, and assigns weights to the similarity measures to score the answer scripts. The paper discusses using keyword-based summarization and four similarity measures (Cosine, Jaccard, Bigram, and Synonym) as parameters for the final mark assignment. Experimental results found the automatic evaluation assigned similar marks as manual evaluation, making it a useful technique.

Uploaded by

Navya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

214 views9 pages

NLP Based Automatic Answer Script Evaluation

Uploaded by

Navya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/330824663

NLP-based Automatic Answer Script Evaluation

Article · December 2018

CITATIONS READS
5 727

2 authors:

Md. Motiur Rahman Fazlul Hasan Siddiqui

Chittagong Veterinary and Animal Sciences University Dhaka University of Engineering & Technology
10 PUBLICATIONS 27 CITATIONS 7 PUBLICATIONS 5 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Md. Motiur Rahman on 22 December 2020.

The user has requested enhancement of the downloaded file.

NLP-based Automatic Answer Script Evaluation

Md. Motiur Rahman1 and Fazlul Hasan Siddiqui2*

Dept. of Physical and Mathematical Sciences, Chittagong Veterinary and Animal Sciences University, Chittagong, Bangladesh
1

2
Dept. of Computer Science and Engineering, Dhaka University of Engineering & Technology, Gazipur, Bangladesh

ABSTRACT

The answer script evaluation is an important part of assessing students’ performance. Typically, an answer
script evaluation is done manually that sometimes can be biased. The evaluation depends on various factors like
mood swing of the evaluator, the inter-relation between the student and evaluator. Additionally, evaluation is a
very tedious and time-consuming task. In this paper, a natural language processing-based method is shown for
automatic answer script evaluation. Our experiment consists of text extraction from answer script, measuring
various similarities between summarized extracted text and stored correct answers, and then assign a weight
value to each calculated parameters to score the answer script. For summary generation from the extracted
text, we have used keyword-based summarization techniques. Here four similarity measures (Cosine, Jaccard,
Bigram, and Synonym) are used as parameters for generating the final mark. Automatic evaluation of answer
scripts has been found very useful from our experiments, and often the assigned marks is the same as manually
scored marks.
Keywords: Automatic Evaluation, NLP, Text Summarization, Similarity Measure, Marks Scoring

1. Introduction The natural language processing is an area of artificial

intelligence which deals with the interaction between
There are various assessment strategies that are used human languages and computer [3]. The most challenging
to evaluate a student’s performance. The most widely task in natural language processing involves speech
used technique is a descriptive question answering. recognition, natural language understanding, and
In this technique, a student expresses his/her opinion natural language generation. The NLP is widely used in
in response to the question in a long textual way. The machine translation, question-answering, automatic text
automatic descriptive answer evaluation system will be summarization, answer script evaluation, etc. [3-4]. Text
very cooperative for various universities and educational summarization helps to find out precise data from a longer
institutions to assess a student’s performance very text document, and speeds up the evaluation process.
effectively [1]. A student may answer a question by
following different grammatical styles, and chooses The text summarization is a process of creating a short,
different words similar to the actual answer. The motivation accurate summary of the longer text. It is very time wasting
behind the automated answer script evaluation comes task to generate a summary of longer article manually.
from less time consuming, less manpower involvement, Hence an NLP-based automatic text summarization
prohibiting human evaluator’s psychological changes, technique is used to facilitate and speed up the text
and very easy to keep record and extraction [2]. It also processing. Two types of text summarization techniques
assures that mood swings or change in perspective of the are used for generating the summary. The extractive
human assessor will not affect the evaluation process. text summarization technique is used to select phrases
and sentences from the source document, and generates
The automatic answer script evaluation based on Natural a new summary [5]. The abstractive text summarization
Language Processing (NLP) will help us to overcome the technique is the opposite of extractive technique. It
difficulties faced in the manual evaluation. Here a student’s generates entirely new phrases and sentences to hold the
written answer is provided as input and the system will meaning of the source document [6]. The NLP-based
automatically score marks after the evaluation. The strategies are very well suited for generating summary
system considers all possible factors like spelling error, rather than the manual process. The summarized text will
grammatical error, and various similarity measures for be fed as input to compute various similarity measures.
scoring marks. The natural language processing technique
is used to make the handling of used English language The similarity measure is a technique to find how much
much easier. two sentences are similar in the sense of semantic,

*
Corresponding author’s email: [email protected]

DUET Journal 35 Volume 4, Issue 1, December 2018

NLP-based Automatic Answer Script Evaluation

syntactic and structure. Similarity measure will enable their proposed method and their proposed method was
us to decide the scoring marks to a answer script [7]. able to evaluate short questions with accuracy up to 86
For measuring similarity, different algorithms, like the percentage [12].
cosine similarity, Jaccard similarity, bigram similarity
and synonym similarity are used [8]. The individual A simple short question evaluation method was developed
similarity measure algorithm defines a separate meaning. by Md Arafat Sultan et al. They gave the short question,
The cosine similarity between two documents generates its correct answer as input and find the only semantic
a metric which tells how two documents are related by similarity of student response with respect to the correct
looking at the angle as a substitute of magnitude. The answer. They also focused on short text similarity and
Jaccard similarity defines the similarity between two set augmented similarity. They computed performance of
of documents and it is computed by dividing the length of their model with Mohler et al. dataset and simpler bag-of-
intersection by the length of the union of two document words model. They witnessed that their proposed model
sets. The bigram similarity deals with the structure of works better with the bag-of-words model [13].
two sentences and tells whether two are similar or not in Michael Mohler et al. developed a model for automatic
respect of structure [9]. The synonym similarity tells how short answer grading. They used unsupervised techniques
much two sentences are similar in respect of synonyms. for automatic short answer grading. They considered
To make ease the manual evaluation process, automatic knowledge-based and corpus-based similarity, and the
marks scoring has become very popular. Automatic marks effect of domain and size of corpus [14]. They added
scoring can be accomplished with the help of machine automatic feedback from student answer in order to
learning. In machine learning approach, some parameters improve the performance. Their developed model
are used to train a machine learning algorithm, and after outperformed than the previously proposed model.
training it can automatically assign score [10]. Another However, they did not take into account the grammatical
approach is assigning a weight value to the respective and spelling error for grading.
parameter, based on importance, and then multiply Jonathan Nau et al. described a method for automatic short
the parameter value and weight value. The summation question answering for the Portuguese language [15].
of the above multiplication defines the marks of the They combined latent semantic analysis and WordNet
corresponding answer. path based similarity measure using linear regression to
For making the answer script evaluation system faster and predict the score for short questions. They compared the
effective, a digital method, based on NLP, is presented in predicted scores to human scores, which was found very
this paper as automatic answer script evaluation. useful in their proposed combined method.
P. Selvi et al. introduced a method for automatic short
answer grading system, which is based on simple lexical
2. BACKGROUND matching [16]. They performed some comparison with
existing method and they found that their proposed model
Answer script evaluation is a very crucial part of student worked well in few cases. It can grade short question with
assessment. A teacher follows various ways like short 59 percent accuracy.
question answering, descriptive question answering and
multiple choice question to assess students [11]. The
evaluation of multiple choice question and short question 2.2 Automatic Descriptive Question Evaluation
is easy and less time consuming, while descriptive System
question answering takes more time to evaluate. Several
methods have been developed for automatic answer script The evaluation of descriptive question is quite difficult in
evaluation. Some of them are mentioned in the following comparison with short question evaluation. It takes more
subsections. time to evaluate, and accuracy depends on various factors
[17]. Hence, many researchers have proposed many
methods for automatic descriptive answer evaluation.
2.1 Automatic Short Question Evaluation System Some are presented below.

A vector-based technique for short question evaluation Shehta et al. developed a model for automatic descriptive
was performed by Ahmed Magooda et al. They observed answer evaluation [18]. They divided their proposed
sentence representation techniques and the wide range system in student module and tutor module. Their model
of similarity measures for the automatically grading takes the student answer and tutor answer as input and
question. For similarity measures, they considered string calculates the semantic similarity between two answer that
similarity, knowledge-based similarity, and corpus-based helps to score marks. They used full NLP to implement
similarity. They used two different datasets to perform their model. Their developed model doesn’t fit for all type

DUET Journal 36 Volume 4, Issue 1, December 2018

NLP-based Automatic Answer Script Evaluation

of data since they focused only on semantic similarity. 3.1 Text Extraction
There were some other factors that influenced the scoring
marks. The captured image from the answer script has been used
as input for text extraction. For extracting text from the
A pattern matching algorithm based method was proposed image, a python class pytesseract has been used. Before
by Pranali Nikam et al. for the assessment of a descriptive extracting text, the noise from the image is removed to
answer [19]. In their study, they represented the student increase the extraction accuracy. Pytesseract is a class
answer and true answer in the form of a graph, and then based OCR and has Unicode (UTF-8) support, and
they matched the pattern between the two graphs. They can recognize more than 100 languages. The result of
match each word of student answer with the true answer. pytesseract is shown in Fig. 1. and Fig.2. The extracted
If any word does not match with the true answer, then text has been used for further processing and computes
find the synonym of that word, match that synonym with various similarity measures.
the true answer. If matching found, replace the original
word with synonym and compute the similarity. Here if
two sentences are out of order with the same word, it gets 3.2 Summary Generation
confused and provide wrong scoring.
From the image, the text is extracted as text format and
A text similarity-based method for automatic scoring of natural language processing is used to make an automatic
descriptive type tests was developed by Izuru Nogaito et summary of the long text. Summary generation will help
al [20]. They measured n-gram and word-level matching to speed up the text processing task by ignoring less
similarity BLUE and RIBES respectively. They also important sentence from the long text document. Several
calculated Doc2Vec based cosine similarity. They found that techniques are available for generating auto-summary.
the most effective similarity measure technique depends on In order to generate the summary of the long text, some
the type of question. Based on the question, the effectiveness keywords from the long text are selected based on the
of similarity measurement techniques is varied. occurrence of the word. Here the average frequent words
Marcelo Loor et al. [21] proposed a method with a have been selected as keywords where the most frequent
combination of LSA, BLEU, WMD and FUZZY LOGIC. and less frequent word are ignored. Then the weight of
They used LSA to find semantic similarity between two each sentence in the text is calculated based on the number
documents. They used WMD to calculate the cumulative of keyword in sentence squared and divided by the
distance that a word needs to pass to reach the reference window size. The window size is the maximum distance
word. The cumulative distance measure distance even if
there is no common word. Finally, they used fuzzy logic Early versions needed to be trained with images of
to score the marks. They applied their proposed model on each character, and worked on one font at a time.
the various dataset and found that accuracy varies between Advanced systems capable of producing a high
0.71 and 0.85. degree of Irecognition accuracy for most fonts are
now common, and with support for a variety of digital
Most of the researcher focused on semantic similarity for
image file format inputs. Some systems are capable of
scoring marks. They did not consider all other similarity
reproducing formatted output that closely approximates
parameters for deciding score. In this experiment, a noble
the original page including images, columns, and other
approach is proposed with different similarity measure
non-textual components.
and used this similarity measures as the parameter.
Finally, assign a weight value to each parameter based on Fig. 1: Imput image
importance to calculate the marks of that question.
======RESTART: C:\Python36\imtotext.py======
Early versions needed to be trained with images of
3. METHODOLOGY each character, and worked on one font at a time.
Advanced systems capable of producing a high
The aim of this study is to evaluate the descriptive answer
degr ee of recognition accuracy for most fonts are
script automatically and assign marks to this respective
now common, and with support for a variety of
question. In order to accomplish this, we take answer
digital image file format inputs. Some systems are
script as input. Python programming language is used
here for implementing every algorithm. Then NLP is used capable of reproducing formatted output tha t closely
to extract text from the answer script and process the data. approximates the original page including images,
Various similarity measure has been calculated that is columns, and other non-textual compon ents.
used as the parameter for assigning marks. Fig. 2: Output text

DUET Journal 37 Volume 4, Issue 1, December 2018

NLP-based Automatic Answer Script Evaluation

between two significant words in a sentence. Then sort 3.3 Text Preprocessing
the sentence in descending order based on their weight
value and finally take first n sentence as a summary of the The Summarized text contains some word which carries
long text. less information and can be ignored to facilitate further
text processing task. The way of converting data in
Pseudocode of Text summarization algorithm a form that a computer can understand is known as
preprocessing. The natural language processing is a very
1. Take text as input effective way to deals with the text preprocessing. Text
2. Tokenize the text into word preprocessing contains tokenize text into word, remove
3. Remove duplicate from word list StopWord, lemmatize word, remove duplicate word etc.
To accomplish this preprocessing using NLP, Natural
4. Count frequency of each word
Language Toolkit (NLTK) is a leading platform for
5. Calculate word percentage dividing word frequency building python program to work with human language
by length of word list data. It has the immensely built-in function to deals the
6. Remove most frequent word and less frequent word text preprocessing by typing fewer commands. An NLTK
by comparing word percentage with a max and min built-in function word_tokenize is used to split the text
threshold value and select average frequent word as into word and store in a list. The most important text
keywords preprocessing step is filter out the useless word. NLTK
7. Count window size for each sentence with the help of has a StopWord corpus which contains frequently
keywords occurred word those are useless to define the meaning of
the sentence. The StopWord corpus has been used to filter
8. Calculate weight of each sentence dividing square of out the unnecessary word.
no of keyword in sentence by window size
9. Sort the sentence in a descending order based on Another text preprocessing step is word lemmatization. A
weight value and select first n sentence as summary word may appear in different form in many languages like
a word walk may appear as walking, walked, and walks.
Another approach based on the bag-of-words ignoring Lemmatization is the process of converting the word
keywords is also used. In order to find the effective into the base form which is known as the lemma. It will
technique for the summary generation, we have calculated compress the length of the word list and save processing
Precision, Recall and F-score. The precision defines time. In order to lemmatize each word, an NLTK built-in
function WordNetLemmatizer is used which convert all
how much system summary (machine generated) is fact
word into corresponding base form.
relevant?
For carrying out some application over data, data need
Number of overlapping Sentence to be formatted in some common format. One kind of
Precision(P) = -------------------------------------------------------------------------------- (1)
Number of sentence in system summary format is bigram or digram which is a sequence of two
adjacent elements from the string of tokens. The bigram
The recall specifies how much of the reference summary frequency distribution is commonly used to analyze the
(human generated) is recovering the system summary?
structural similarity of text. To generate bigram, bigram
function of NLTK is used and it returns a list of bigram
Number of overlapping Sentence
Recall(R) = ------------------------------------------------------------------------------------------ (2) from all words. Here the frequency of each word also
Number of sentence in reference summary
counts and stored in a dictionary where word used as key
F-score is the correlation measure that combines the and store no of occurrence as data in the dictionary. Then
precision and recall. The basic way to calculate F-score is the word dictionary with frequency and bigram are used
to compute the harmonic average of precision and recall. for measuring various similarity.
2.P.R
F-score = ----------------- (3)
P+R 3.4 Similarity Measure
Here, the F-score of keyword based summary In many cases, it is needed to define whether two sentences
generation technique is greater than the bag-of- are similar or not. Similarity measures is a term which
word based summary generation. Then the generated tells two sentences are similar or not by considering the
summary is compared with the true answer to find different angle of similarity. Several similarity measure
various similarity measure. Summary generation techniques are available that can be performed. In this
techniques and findings have discussed in result and experiment, cosine similarity, Jaccard similarity, bigram
discussion section in details. similarity, synonym similarity are performed.

DUET Journal 38 Volume 4, Issue 1, December 2018

NLP-based Automatic Answer Script Evaluation

Cosine similarity is very interesting similarity measure the structural similarity, Bigram similarity measure has
technique which looks at the angle of two documents and been performed. The pseudocode is presented in below.
tells how much they are similar.
(A, B) Pseudocode of Bigram Similarity
Cosine-similarity (A, B) = ------------------ (4)
||A||.||B||
1. Take two word lists as input
Where A and B are the word vector and each component 2. Generate bigram from two word lists. Bigram is the
of the vector contains word frequency or TF-IDF value. sequence of two adjacent tokens in string
Here cosine similarity measure is carried out between 3. Compute the no of common bigram in two bigram
student answer and true answer. Cosine similarity measure list.
provides a very prominent result in terms of similarity. 4. Divide the no of common bigram by average bigram
Cosine similarity has been implemented in this experiment length of two bigram list.
in python language. The pseudocode is shown below.
5. Division will produce the bigram similarity.
Pseudocode of Cosine Similarity
In many languages, a word has many synonyms that hold
1. Take dictionary of word and frequency as input. the similar meaning. Hence, during the evaluation of the
2. Create two word vector where one for student answer answer script, the synonym of the word have to consider
and another for true answer. Length of each vector for scoring marks. In this study, each word of student
should be the length of total word list. answer in matched with the true answer. If no matching
word found in true answer, then retrieve all synonym
3. Calculate dot product of two vector
of that word and again match with the true answer. To
4. Compute norm of first vector generate a synonym of a word, an NLTK wordnet function
5. Compute norm of second vector synsets is used. Synonym similarity is measured based on
6. Multiply first and second norm how much actual and synonym word of student answer
is matched with the true answer and then divide it by
7. Divide dot product result by multiplication result and average word length of two documents.
it will provide cosine similarity.
Jaccard similarity measure is another similarity measure Pseudocode of Synonym Similarity
technique which tells the degree of similarity by measuring
the intersection and union of two word list. 1. Take two word list as input
|A∩B| 2. Match each word of student answer with true answer
Jaccard Similarity(A,B) = ------------------ (5) and count no of matching.
|A∪B|
3. If there exist no matching word in true answer, then
Where A and B are two word lists. Jaccard similarity is generate synonym of that answer.
measured by dividing the intersection of two word lists
4. Match each synonym on that word with true answer
with the union of that two word list. The intersection
and count no of matching.
defines how much common word are between two word
lists and the union defines total word in both lists. 5. Divide the no of matching value by average length of
two documents.
Pseudocode of Jaccard Similarity
6. Division will generate synonym similarity value.
1. Take two word lists as input.
The efficient evaluation of answer script also depends on
2. Perform intersection operation between two word grammatical and spelling correctness. In this experiment,
lists. The AND (&) operation performs intersection. the grammatical and spelling mistake is also taken into
consideration. To count the spelling and grammar error,
3. Perform union operation between two word lists. a python package language check is used. The computed
Here add the length of two word list and subtract the four similarity measure and grammatical-spelling error
length of intersection that is the union of two word are used as the parameter for automatic marks scoring.
list.
4. Divide intersection result by union result that will 3.5 Marks Scoring
produce Jaccard similarity
The one purpose of this study is to automatically score
In this study, the structural similarity between two
marks after evaluation. It is the final step of the experiment
documents is also taken into account. In order to compute

DUET Journal 39 Volume 4, Issue 1, December 2018

NLP-based Automatic Answer Script Evaluation

and the accuracy of this step will enhance the overall and Jaccard similarity. These parameters are used to
impact of this study. automatically evaluate three types of question (M5, M10,
and M15) in terms of marks. The different weight value is
Here, a weight value is assigned to each parameter based assigned to each parameter based on question types. The
on the importance of the parameter. To improve the weight assigned to each parameter are shown in Table
accuracy of assigning weight value, a survey study over II. The weight value is taken after averaging the survey
50 samples has been carried out. The average weight value that for each parameter. From Table II, it is found
value estimation from the survey is accepted and applied. in the survey that the importance of synonym parameter
is more and grammatical-spelling error parameter is less
(6) for the evaluation of answer script. The high weight value
indicates that the importance of that parameter is more
Where Pk is the kth parameter and Wk. is the weight value for deciding the marks. The value of parameter comes
of kth parameter. After assigning the weight value to each between zero to one based on the similarity and presence of
parameter, the weight value and the parameter value is the error. The higher parameter value means the similarity
multiplied. Then add all value of multiplication which is between two documents is more and vice-versa. In this
the final marks of that answer script. study, thirty answer script of three types of question is
evaluated and marks are taken for testing the accuracy of
In order to test our experiment, total thirty sample the proposed model. Additionally, the above mentioned
descriptive questions and the student answer to that five parameters are calculated from that thirty answer
question has been evaluated in a manual way. Three types script and used them for automatic marks scoring. The
of question in terms of marks are considered for this manual evaluated marks and auto-score marks are shown
experiment. These are 5 marks question (M5), 10 marks in Table III. From Table III, we see that our proposed
question (M10) and 15 marks (M15) question It has been automatic answer script evaluation system score marks
seen that most of the cases, our proposed method has very near to the manually scored marks. The comparison
scored score marks very near to manual judgment. of automated scored marks and manually computed
marks are shown in Fig. 3. From Fig. 3, we have found
that there is a slight difference between automated scored
4. RESULT AND DISCUSSION marks and manually scored marks. Most of the cases
the automated assigned marks and manually assigned
The goal of this study is to evaluate the descriptive answer marks are very close. When the student answer and the
script automatically and score marks. This will reduce the true answer contain more structural similarity as well as
time for evaluating answer script and bring equality for synonym similarity, the automated scored marks are very
evaluation. To satisfy those requirements, we used a weight close to the manually scored marks. On the other hand, a
parameter-based technique for automatic evaluation. The notable difference between the automated scored marks
summary generation of extracted text plays an important and manually scored marks exist when the student answer
role for the effectiveness of this experiment. For accepting and the true answer have less structural similarity while
an efficient technique for the summary generation, more Jaccard and Cosine similarity. It is also noticed
we have calculated F-score of the generated summary from Table III. and Fig. 3 that the difference between the
of two techniques with the comparison of reference manually scored marks and automated scored marks is
summary. The estimated F-score of the keyword based small for short question (M5) and opposite happens for
summarization and bag-of-words based summarization descriptive question (M15).
are shown in Table I. Table I. indicates that the F-score
of our used summarization technique is greater than the
bag-of-words based summarization technique. 5. CONCLUSION AND FUTURE WORK
In this experiment, five parameters have been considered
for scoring marks. These are synonym similarity, bigram In this experiment, we have developed a natural language
similarity, grammatical-spelling error, cosine similarity processing-based method for automatic answer script
evaluation and marks scoring. Our system consists of the
Table 1: F-score Calculation following steps (1) text extraction from the image, (2)
text summarization using keyword-based technique, (3)
Keyword-based Bag-of-word based
text preprocessing for further analysis, (4) finding various
Summarization Summarization
similarity measures, and (5) marks scoring. In the first step,
Precision 0.9 0.83
the text is extracted using pytesseract which works based
Recall 0.83 0.41 on OCR. Then the extracted text is summarized using
F-score 0.86 0.53 keyword based summarize technique. Here we accept the

DUET Journal 40 Volume 4, Issue 1, December 2018

NLP-based Automatic Answer Script Evaluation

average frequent word as the keyword and ignore most [5] S. R. Rahimi, A. T. Mozhdehi, and M. Abdolahi, “An
frequent and less frequent word. The summarized text is overview on extractive text summarization,” 2017
preprocessed with the aid of NLTK which is a leading IEEE 4th International Conference on Knowledge-
platform for building python program. Here tokenization, Based Engineering and Innovation (KBEI), Tehran,
stopword removal, lemmatization, bigram generation, and 2017, pp. 0054-0062.
word frequency count are performed as a preprocessing. [6] P. K. Rachabathuni, “A survey on abstractive
We also consider grammatical and spelling error for answer summarization techniques,” 2017 International
script evaluation. After preprocessing, four similarity Conference on Inventive Computing and Informatics
measures – synonym similarity, bigram similarity, cosine (ICICI), Coimbatore, 2017, pp. 762-765.
similarity and Jaccard similarity measure are computed,
which are used as the parameter for final marks scoring. [7] V. U. Thompson, C. Panchev, and M. Oakes,
In order to score marks, a weight value is assigned to each “Performance evaluation of similarity measures
parameter after doing a survey on best weight estimation. on similar and dissimilar text retrieval,” 2015 7th
The weight value is multiplied with parameter value to International Joint Conference on Knowledge
score final marks to that question. In this system, we have Discovery, Knowledge Engineering and Knowledge
considered three types of questions based on marks, and Management (IC3K), Lisbon, 2015, pp. 577-584.
the answer script based on that question is evaluated in a [8] A. R. Lahitani, A. E. Permanasari, and N. A.
manual way. Manual marks are compared with automated Setiawan, “Cosine similarity to determine similarity
scored marks to validate our developed method. In most measure: Study case in online essay assessment,”
cases, we have found that our proposed method scored 2016 4th International Conference on Cyber and IT
marks similar to the manually assigned marks. It happens Service Management, Bandung, 2016, pp. 1-6.
for a very few cases that the automated assigned marks
[9] Y. Oganian, M. Conrad, A. Aryani, K. Spalek, and
are slightly higher or lower than the manually assigned
H. R. Heekeren, “Activation Patterns throughout
marks. The limitation of our research is that we assign
the Word Processing Network of L1-dominant
a weight value to each parameter manually by doing a
Bilinguals Reflect Language Similarity and
survey. Therefore, our next goal is to introduce machine
Language Decisions,” in Journal of Cognitive
learning algorithm that will be trained by various
Neuroscience, vol. 27, no. 11, pp. 2197-2214, Nov.
calculated parameters, and algorithm will predict the
2015.
marks of that answer script. Also in the future, we will
introduce some new techniques for effective and precise [10] L. Gao and H. Chen, “An automatic extraction
summary generation. method based on synonym dictionary for web reptile
question and answer,” 2018 13th IEEE Conference
on Industrial Electronics and Applications (ICIEA),
References
Wuhan, 2018, pp. 375-378.
[1] V. Nandini, P. Uma Maheswari, “Automatic [11] T. Bluche, C. Kermorvant, C. Touzet, and H. Glotin,
assessment of descriptive answers in online “Cortical-Inspired Open-Bigram Representation for
examination system using semantic relational Handwritten Word Recognition,” 2017 14th IAPR
features”, The Journal of Supercomputing, 2018. International Conference on Document Analysis and
Recognition (ICDAR), Kyoto, 2017, pp. 73-78.
[2] D. V. Paul and J. D. Pawar, “Use of Syntactic
Similarity Based Similarity Matrix for Evaluating [12] A. Magooda , M. A. Zahran , M. Rashwan , H. Raafat,
Descriptive Answer,” 2014 IEEE Sixth International and M. B. Fayek, “Vector Based Techniques for
Conference on Technology for Education, Clappana, Short Answer Grading, “ Proceedings of the Twenty-
2014, pp. 253-256. Ninth International Florida Artificial Intelligence
Research Society Conference,2014,pp.238-243
[3] K. Meena and L. Raj, “Evaluation of the descriptive
type answers using hyperspace analog to language [13] M. A. Sultan C. Salazar, and T, Sumner, “Fast and
and self-organizing map,” 2014 IEEE International Easy Short Answer Grading with High Accuracy,”
Conference on Computational Intelligence and Proceedings of NAACL-HLT 2016, pp.1070–1075
Computing Research, Coimbatore, 2014, pp. 1-5. [14] M. Mohler and R. Mihalcea, “Text-to-text Semantic
[4] Y. Oganian, M. Conrad, A. Aryani, K. Spalek and Similarity for Automatic Short Answer Grading,”
H. R. Heekeren, “Activation Patterns throughout International Journal of Artificial Intelligence in
the Word Processing Network of L1-dominant Education 25 (2015), 118 – 125
Bilinguals Reflect Language Similarity and Language [15] J. Nau, A. H. Filho, and G. Passero, “Evaluating
Decisions,” in Journal of Cognitive Neuroscience, Semantic Analysis Methods For Short Answer
vol. 27, no. 11, pp. 2197-2214, Nov. 2015. Grading Using Linear Regression, “PEOPLE:

DUET Journal 41 Volume 4, Issue 1, December 2018

NLP-based Automatic Answer Script Evaluation

International Journal of Social Sciences (2017), Graph Alignments, “International Journal of Artificial
Volume 3 Issue 2, pp. 437 – 450. Intelligence in Education 27 (2016), 83-89.
[16] P. Selvi and A. K.Bnerjee, “Automatic Short – [19] P. Nikam, M. Shinde, R. Mahajan, and S. Kadam,
Answer Grading System (ASAGS),” InterJRI “Automatic Evaluation of Descriptive Answer Using
Computer Science and Networking (2010), Vol. 2, Pattern Matching Algorithm,” International Journal
Issue 1, pp.18-23. of Computer Sciences and Engineering (2015) Vol.-
[17] S. K. Chowdhury and R. J. R. Sree, “Dimensionality 3(1), pp.69-70.
reduction in automated evaluation of descriptive [20] M. S. M. Patil and M. S. Patil, “Evaluating Student
answers through zero variance, near zero variance Descriptive Answers Using Natural Language
and non-frequent words techniques - a comparison,” Processing,” International Journal of Engineering
2015 IEEE 9th International Conference on Research & Technology (IJERT) 2014, Vol. 3 Issue 3.
Intelligent Systems and Control (ISCO), Coimbatore, [21] M. Loor and G. De Tré, “Choosing suitable similarity
2015, pp. 1-6. measures to compare intuitionistic fuzzy sets that
[18] M. A. G. Mohler, R. Bunescu, and R. Mihalcea, represent experience-based evaluation sets,” 2015
“Learning to Grade Short Answer Questions using 7th International Joint Conference on Computational
Semantic Similarity Measures and Dependency Intelligence (IJCCI), Lisbon, 2015, pp. 57-68.

DUET Journal 42 Volume 4, Issue 1, December 2018

View publication stats

Unit I Predictive Analytics
No ratings yet
Unit I Predictive Analytics
39 pages
Vector and Array Processor
No ratings yet
Vector and Array Processor
3 pages
CS8091 - Big Data Analytics - Unit 1
No ratings yet
CS8091 - Big Data Analytics - Unit 1
28 pages
2024 and 2025 Python ML - DL - AI IEEE Projects List
100% (1)
2024 and 2025 Python ML - DL - AI IEEE Projects List
6 pages
NLP UNIT 5 Part B
100% (2)
NLP UNIT 5 Part B
31 pages
Unit1 ML
No ratings yet
Unit1 ML
23 pages
Natural Language Processing
100% (5)
Natural Language Processing
49 pages
Lecture5 Basic Traversal and Search Techniques
100% (1)
Lecture5 Basic Traversal and Search Techniques
68 pages
A Beginner's Guide To Large Language Models
No ratings yet
A Beginner's Guide To Large Language Models
10 pages
Python Project Result Management System
No ratings yet
Python Project Result Management System
21 pages
NLP Course File Notes
No ratings yet
NLP Course File Notes
71 pages
Flat Unit 5 Notes
No ratings yet
Flat Unit 5 Notes
10 pages
JNTUA Advanced Data Structures and Algorithms Lab Manual R20
No ratings yet
JNTUA Advanced Data Structures and Algorithms Lab Manual R20
71 pages
Automatic Speech Recognition Using Python
No ratings yet
Automatic Speech Recognition Using Python
18 pages
Evaluating Student Descriptive Answers Using Natural Language Processing IJERTV3IS031517
No ratings yet
Evaluating Student Descriptive Answers Using Natural Language Processing IJERTV3IS031517
3 pages
Documentacao Langchain
No ratings yet
Documentacao Langchain
53 pages
Mobile Application Development PDF
No ratings yet
Mobile Application Development PDF
85 pages
6.induction As Inverted Deduction
No ratings yet
6.induction As Inverted Deduction
22 pages
First Review PPT Template-1
No ratings yet
First Review PPT Template-1
14 pages
Fuel Final
No ratings yet
Fuel Final
25 pages
Ai Chat Bot Unit - 2
No ratings yet
Ai Chat Bot Unit - 2
31 pages
1.disabling Interrupts:: Mutual Exclusion With Busy Waiting
No ratings yet
1.disabling Interrupts:: Mutual Exclusion With Busy Waiting
2 pages
TE AI Honor Course
No ratings yet
TE AI Honor Course
18 pages
Phase 2 Final
100% (1)
Phase 2 Final
65 pages
UNIT 4 Information Retrieval Using NLP
No ratings yet
UNIT 4 Information Retrieval Using NLP
13 pages
NLP Important and Super Important Questions-18CS743
No ratings yet
NLP Important and Super Important Questions-18CS743
2 pages
Cs2351 Ai Notes
100% (1)
Cs2351 Ai Notes
91 pages
Data Centric Artificial Intelligence: A Beginner's Guide
No ratings yet
Data Centric Artificial Intelligence: A Beginner's Guide
137 pages
Web Programming Unit-1 Notes
No ratings yet
Web Programming Unit-1 Notes
85 pages
4-Data Cleaning, Data Integration, Data Transformation, Data Reduction-03-02-2024
No ratings yet
4-Data Cleaning, Data Integration, Data Transformation, Data Reduction-03-02-2024
22 pages
Mobile Application Development Question Paper
No ratings yet
Mobile Application Development Question Paper
12 pages
Internship Report
No ratings yet
Internship Report
13 pages
Transfer Learning Seminar
No ratings yet
Transfer Learning Seminar
12 pages
Lab Program
100% (1)
Lab Program
15 pages
Animal Detection and Prevention in Agri Field Using Iot
No ratings yet
Animal Detection and Prevention in Agri Field Using Iot
36 pages
ARTIFICIAl iNTELLIGENCE Unit III &iv
No ratings yet
ARTIFICIAl iNTELLIGENCE Unit III &iv
39 pages
Big Data NOTES and QB
No ratings yet
Big Data NOTES and QB
92 pages
355955B30 Siddesh Mahind SMA Exp-5
No ratings yet
355955B30 Siddesh Mahind SMA Exp-5
11 pages
AI-ques-ans-Unit-1 Prof. Anuj Khanna KOIT
100% (1)
AI-ques-ans-Unit-1 Prof. Anuj Khanna KOIT
17 pages
PPT1
No ratings yet
PPT1
93 pages
SCT - QB - Anwers - p1
No ratings yet
SCT - QB - Anwers - p1
53 pages
Vtu Software Testing Unit 1
100% (1)
Vtu Software Testing Unit 1
63 pages
Tsa Ut III Tsa Notes
No ratings yet
Tsa Ut III Tsa Notes
30 pages
Artificial Intelligence Module 5
No ratings yet
Artificial Intelligence Module 5
23 pages
Lecture 1: Introduction To NLP: Understand Concepts Applications
No ratings yet
Lecture 1: Introduction To NLP: Understand Concepts Applications
32 pages
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
No ratings yet
Natural Language Processing: Dr. Tulasi Prasad Sariki SCOPE, VIT Chennai
29 pages
IS 7118 Unit-5 POS Tagging
No ratings yet
IS 7118 Unit-5 POS Tagging
89 pages
NLP 101 - Machine Learning Seminar 2017
100% (1)
NLP 101 - Machine Learning Seminar 2017
30 pages
3-1 Bigdata (Spark)
No ratings yet
3-1 Bigdata (Spark)
3 pages
SRM Institute of Science and Technology
No ratings yet
SRM Institute of Science and Technology
6 pages
Ai-Unit-I Notes
No ratings yet
Ai-Unit-I Notes
74 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
SPM Question Bank
No ratings yet
SPM Question Bank
5 pages
Natural Language Processing: by Dr. Parminder Kaur
No ratings yet
Natural Language Processing: by Dr. Parminder Kaur
26 pages
SKP Engineering College: A Course Material On
No ratings yet
SKP Engineering College: A Course Material On
212 pages
Model Question Paper
0% (1)
Model Question Paper
2 pages
Ainotes Module4 Parta
No ratings yet
Ainotes Module4 Parta
11 pages
Certif Prep
No ratings yet
Certif Prep
11 pages
Assignment 1 Computer Architecture
No ratings yet
Assignment 1 Computer Architecture
3 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Demos 2012
No ratings yet
Demos 2012
550 pages
Question Paper Generator Synopsis
No ratings yet
Question Paper Generator Synopsis
3 pages
Seminar Topics
No ratings yet
Seminar Topics
10 pages
NLP-based Automatic Answer Evaluation
No ratings yet
NLP-based Automatic Answer Evaluation
5 pages
Speech Recognition Full Report
No ratings yet
Speech Recognition Full Report
11 pages
The Architecture of A Question Answering (QA)
No ratings yet
The Architecture of A Question Answering (QA)
65 pages
How To Become A Better Copywriter in The Age of AI Search in 2025
No ratings yet
How To Become A Better Copywriter in The Age of AI Search in 2025
6 pages
Fikir Setie Tezera
No ratings yet
Fikir Setie Tezera
68 pages
Socratic Models ML AI LM 2204.00598
No ratings yet
Socratic Models ML AI LM 2204.00598
20 pages
Ask Your PDF (Thesis)
No ratings yet
Ask Your PDF (Thesis)
42 pages
The Science of Detecting LLM-Generated Texts: Ruixiang Tang, Yu-Neng Chuang, Xia Hu
No ratings yet
The Science of Detecting LLM-Generated Texts: Ruixiang Tang, Yu-Neng Chuang, Xia Hu
10 pages
AmazonQA - A Review-Based Question Answering Task PDF
No ratings yet
AmazonQA - A Review-Based Question Answering Task PDF
8 pages
Faithful Reasoning Using Large Language Models: Antonia Creswell and Murray Shanahan
No ratings yet
Faithful Reasoning Using Large Language Models: Antonia Creswell and Murray Shanahan
48 pages
Natural Language Processing: Rada Mihalcea
No ratings yet
Natural Language Processing: Rada Mihalcea
27 pages
Text Mining
No ratings yet
Text Mining
40 pages
Plug-and-Play Compositional Reasoning With Large Language Models
No ratings yet
Plug-and-Play Compositional Reasoning With Large Language Models
25 pages
Ai Unit 3 Part 2
No ratings yet
Ai Unit 3 Part 2
8 pages
Recent Advances in Text-To-SQL - A Survey of What We Have and What We Expect
No ratings yet
Recent Advances in Text-To-SQL - A Survey of What We Have and What We Expect
22 pages
TRad
No ratings yet
TRad
16 pages
Creating Large Language Model Applications Utilizing Langchain: A Primer On Developing LLM Apps Fast
No ratings yet
Creating Large Language Model Applications Utilizing Langchain: A Primer On Developing LLM Apps Fast
8 pages
209 Emergent Abilities of Large La
No ratings yet
209 Emergent Abilities of Large La
30 pages
QOG:Question and Options Generation Based On Language Model: Jincheng Zhou
No ratings yet
QOG:Question and Options Generation Based On Language Model: Jincheng Zhou
13 pages
Hands On Question Answering Systems With BERT Applications in Neural Networks and Natural Language Processing 1st Edition Navin Sabharwal Amit Agrawal 2024 Scribd Download
100% (5)
Hands On Question Answering Systems With BERT Applications in Neural Networks and Natural Language Processing 1st Edition Navin Sabharwal Amit Agrawal 2024 Scribd Download
53 pages
Information Retrieval, Question Answering Systems, and Chatgpt: Technology, Capability, and Intelligence
No ratings yet
Information Retrieval, Question Answering Systems, and Chatgpt: Technology, Capability, and Intelligence
15 pages
Explainable Multi-Agent Reinforcement Learning For Temporal Queries
No ratings yet
Explainable Multi-Agent Reinforcement Learning For Temporal Queries
9 pages
RAG Based Question-Answering For Contextual Response Prediction System
No ratings yet
RAG Based Question-Answering For Contextual Response Prediction System
10 pages
HLD LLD Design
No ratings yet
HLD LLD Design
3 pages
Assignment 2 - NLP 2024
No ratings yet
Assignment 2 - NLP 2024
2 pages
Sarrouti Mourad Poster
No ratings yet
Sarrouti Mourad Poster
1 page
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet

NLP Based Automatic Answer Script Evaluation

Uploaded by

NLP Based Automatic Answer Script Evaluation

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

NLP-based Automatic Answer Script Evaluation

Article · December 2018

Md. Motiur Rahman Fazlul Hasan Siddiqui

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

NLP-based Automatic Answer Script Evaluation

1. Introduction The natural language processing is an area of artificial

DUET Journal 35 Volume 4, Issue 1, December 2018

DUET Journal 36 Volume 4, Issue 1, December 2018

DUET Journal 37 Volume 4, Issue 1, December 2018

DUET Journal 38 Volume 4, Issue 1, December 2018

DUET Journal 39 Volume 4, Issue 1, December 2018

DUET Journal 40 Volume 4, Issue 1, December 2018

DUET Journal 41 Volume 4, Issue 1, December 2018

DUET Journal 42 Volume 4, Issue 1, December 2018

View publication stats

You might also like