NLP and OCR Based Automatic Answer Script
NLP and OCR Based Automatic Answer Script
22
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.42, September 2024
grading system. In turning the answer scripts into machine- the less frequent words to generate a summary. For text
readable format, the OCR feature is adopted in this system for preprocessing, the paper uses NLTK, which is a popular
identifying textual content alone, but is also capable of dealing framework for natural language processing, and performs
with other components such as tables and figures. The paper tokenization, stop-word removal, lemmatization, bigram
presents method of and best approach to grading using machine creation, and word frequency count. For information retrieval,
learning techniques as well as the application of support vector the paper uses a word2vec model to convert words into vectors
machines in grading. Unfortunately, the usually involved and measure their semantic similarity. For mark scoring, the
datasets in the respective decade are not discussed from the paper uses four similarity measures: cosine similarity, Jaccard
search results in the paper. However, it is probable that the similarity, bigram similarity, and synonym similarity, which
researchers employed a set of answer scripts for compare the student's answer with the correct answer and
training/development and a set of grading criteria/rubrics for calculate a score based on the angle, intersection, structure, and
grading the students’ papers. synonyms of the sentences.
[2] acknowledges effectiveness of an automatic method of [6] proposes a system that consists of the following steps: input
essay scoring in mitigating the issue of limited time in marking image, preprocessing, feature extraction, text recognition, NLP
writing assignments and subjectivity of the grading process. techniques, data splitting, classification, mark evaluation, and
The procedures adopted in the foregoing paper involve the use performance metrics. The system uses the py-tesseract library
of Natural Language Processing (NLP), sentiment analysis, and for OCR, the mean and standard deviation for feature
machine learning specifically the Long Short-Term Memory extraction, the artificial neural network (ANN) for
(LSTM) models for essay grading where the essays are written classification, and the number of words and letters for mark
in English. The phenomena are identified using NLP evaluation. The paper uses various methodologies such as
algorithms and it utilizes syntactic, semantic and sentiment image processing, OCR, NLP, and deep learning to implement
features of the essays to predict the grades by employing LSTM the proposed system. The paper also uses some tools such as
models. tkinter, matplotlib, and numpy for data handling and
visualization. It reviews some of the previous works related to
[3] The paper identifies the challenges and limitations of OCR, NLP, and answer evaluation using machine learning. The
manual evaluation of subjective answers, such as bias, paper cites some of the challenges and limitations of the
inconsistency, time consumption, and human resources. It aims existing methods and highlights the novelty and advantages of
to develop a system that can automate the evaluation process the proposed system.
and reduce the need for human intervention. The paper presents
a two-part system: a checker and an evaluator. The checker [7] Preprocessing the answer scripts involves data processing
takes a question, a student’s answer, an expected answer, and including methods like tokenization, lemmatization, and word
total marks as input, and assigns a score to the student’s answer embedding which converts the answer scripts into numerical
based on grammar, keywords, and similarity. The evaluator vector form. To do so, the paper employs deep learning
takes a sample of student’s answers and finds the best techniques such as LSTM, Recurrent neural networks, and
combination of evaluation techniques and weights for each dropout and other methods to learn the semantic representation
question. The system allows the user to choose from different of the answer scripts and then assign score to them. In the study,
methods for keyword extraction, summarization, and similarity the D-DAS is trained and evaluated through a supervised
check, or use the optimal combination suggested by the learning technique by providing answer scripts along with
evaluator. human-assessed scores as the manual dataset. The paper looks
at the existing literature on AES, and other short answer
[4] The paper is organized by first presenting the background grading systems, culminating in their strengths and
work which is divided into the research techniques which weaknesses. It also walks through various forms of LSTM
include the similarity measures and the machine learning models, including simple LSTM, deep LSTM, and
techniques. The paper also overviews the pros and cons of the bidirectional LSTM, as well as their use in practical natural
methods and offers some recommendations for an ideal grading language processing and information retrieval applications.
system: You can see that the automation of the answers
valuation scripts provides the grading system bias-free and [8] The paper also outlines the various earlier works done on
coherent. This is why there is a need to establish a model that the use of a computer to evaluate, text mining and measurement
erases the precisions and achievements the grading of text similarity. For the assessment of the student
performance because the outcome of the assessments concerns performance, there currently exists an evaluation paradigm
the student’s future. Consequently, having reviewed the study, which involves a powerful and effective Natural Language
the authors established that there exists two primary strategies Processing (NLP) algorithm. This research was followed by the
in answer grading; similarity measures and Machine Learning creation of the tool that incorporates the NLP analysis along
strategies. While similarity-based measures do not require a with the Artificial Neural Network (ANN) to perform
large training set, these methods are not effective in cases calculations. A filter set for matching an answer to the
where it needs to mine for open-ended responses. On the other examination process is developed by the faculty in form of an
hand, ML techniques expanded the possible coverage of answer sheet and a keyword dataset corresponding to the
grading systems and they do well even with the semi-open- answer for the examination process. In this context, these
ended questions. This means an enormous labelled training set datasets are contained in a data storage system. The results are
is needed to solve each question which may not be convenient then compared to the ANN algorithm to identify if they contain
at all. the correct answer from the student. Also, the student’s answer
is corrected for spelling and grammatical mistakes whenever
[5] The paper uses various methodologies for each component there is unevenness using the NLP algorithm. The results
of the system, such as OCR, NLP, machine learning, and generated from the text mining technique are calculated as soon
similarity algorithms. For image text extraction, the paper uses as the techniques from NLP and ANN reach the end of their
py-tesseract, which is a Python-based OCR tool that converts process.
images into text. For summarization, the paper uses a keyword-
based technique that selects the most frequent words and avoids [9] presents NLP techniques, such as tokenization, part-of-
23
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.42, September 2024
speech tagging, stop word removal, stemming, and semantic another important process involved in the system where
similarity checking, to preprocess and analyze the student features are extracted through n-gram, cosine similarity, latent
answers and compare them with the standard answers. Latent semantic analysis, and string similarity. It is also employed in
Semantic Analysis (LSA), which is an NLP technique based on the use of categorization models such as artificial neural
a mathematical model that creates a vector representation of a networks, support vector machines, and linear regression to
document and measures the similarity between documents by assign grades. The system also affords giving the specific
calculating the distance between vectors. Bilingual Evaluation scores reflecting the level of answers, recommendations and
Understudy (BLEU), which is an algorithm that analyzes and tips. This paper highlights the literature review focusing on the
measures the similarity between the student answer and the implicit automated question answering natura language, and
standard answer based on the n-gram co-occurrence matching the evolution of research in this field starting from the initial
procedure. advancement in artificial intelligence till the present time. The
paper categorizes the existing systems into three types: which
[10] presents a system for online paper evaluation using NLP are called corpus-based, information extraction, and mapping.
for handwritten answer sheets and automatic mark sheet Furthermore, the paper also provides an overview of the
publishing. The system consists of the following modules: research limitations and future tasks in the domain which
registration and login, upload, OCR, tokenization, similarity include content analysis, semantic analysis, and feedback
check and scoring. The system allows students to upload their system.
scanned answer sheets and teachers to upload their answer
keys. The system then converts the answer sheets into text [14] addresses the challenge of evaluating students’
using OCR, tokenizes the text and removes stop words, performance through answer scripts. Traditional manual
compares the text with the answer keys using WordNet and evaluation can be biased and is influenced by various factors
Corpus, and assigns marks based on the cosine similarity like the mood swing of the evaluator and the inter-relation
measure. The system also generates a mark sheet for each between the student and evaluator. The paper proposes an
student and displays the results to the users. The paper also uses automatic answer script evaluation system based on Natural
the cosine formula to calculate the similarity score between the Language Processing (NLP). The system takes a student’s
answer sheet and the answer key, and to determine the marks written answer as input and automatically scores marks after
obtained by the student. the evaluation. The system considers all possible factors like
spelling error, grammatical error, and various similarity
[11] In this paper, the use of NLP and ML in creating a model measures for scoring marks. The system uses NLP for handling
to assess free-response answer scripts. This paper shall attempt the English language used in the answers. For summary
at offering a solution to the general problem of the way in generation from the extracted text, keyword-based
which answer scripts in formative and summative assessments summarization techniques are used. Four similarity measures
to general tests and examinations are evaluated, especially (Cosine, Jaccard, Bigram, and Synonym) are used as
during the COVID-19 pandemic and the lockdown. parameters for generating the final mark. The paper discusses
Accordingly, the paper presents a model responsible for the the motivation behind the automated answer script evaluation,
scoring of descriptive answers with the help of the similarity which includes less time consumption, less manpower
feature that can be calculated with the help of answer keywords involvement, prohibiting human evaluator’s psychological
extracted from the reference solution. The paper also examines changes, and easy record keeping and extraction.
several prior systems and research studies that deal with the
issue of using text perception assessment for the assessment of [15] The paper presents a text analysis pipeline consisting of
the answer scripts by employing text extraction, similarity four stages: OCR, sentence boundary detection, tokenization,
estimation, BLEU engineering modification, probabilistic and part-of-speech tagging. The paper uses freely available
semantic/text relatedness assessment, ontology, artificial opensource software packages for each stage, and applies them
neural network, Wordnet, Word2vec, WMD, cosine similarity, to a large dataset of scanned news articles with different levels
multinomial naïve Bayes, and term frequency-inverse of degradation. It then compares the results of the text analysis
document frequency. The paper validates if the system works stages on the clean and noisy versions of the same documents
according to the model on a local dataset by comparing the using the proposed evaluation paradigm, which can identify
reference answers with student answers on the computerized and track individual OCR errors and their cascading effects.
tests and comparing the two sets of answers on a manual basis. The paper also proposes a novel evaluation paradigm based on
The paper states that the proposed model was able to get hierarchical dynamic programming to measure and analyze the
average accuracy of 80% and developed a text file that gives impact of OCR errors on NLP stages.
the score for the answers. To support their arguments, the paper
also presents a graphical representation of the validation 3. ARCHITECTURE
process carried out manually and with the proposed system.
[12] The paper proposes a system called Automatic Answer
Checker (AAC), which consists of a web-based interface for
uploading question papers and answer sheets, and a machine
learning module for analyzing and scoring the answers. The
system uses natural language processing techniques such as
word tokenization, stop- word and punctuation removal, and
stemming to preprocess the text and extract keywords. The
system then compares the keywords in the student’s answer
with the keywords in the model answer and calculates a
similarity score. Based on the score, the system assigns marks
to the student and displays them on the web interface.
[13] for the provision of a system for the automatic scoring of
descriptive answers of machine learning. Feature extraction is
24
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.42, September 2024
4. METHODOLOGY
4.1 Data Collection and Preprocessing
1) Answer Script Collection: Collect a diverse set of
handwritten or typed answer scripts from various
educational institutions or examinations. Ensure that
the dataset covers a range of subjects, difficulty
levels, and writing styles.
2) Digitization: Scan the collected answer scripts to
create digital images or documents that can be
processed by the OCR system.
3) Ground Truth Preparation: Establish a ground truth
dataset by manually grading a subset of the collected
answer scripts. This ground truth will be used to train
and validate the NLP algorithms.
25
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.42, September 2024
6. CONCLUSION
The development and implementation of an Automated
Answer Script Evaluation System represent a pivotal
advancement in the educational technology landscape, aiming
to address the challenges associated with manual evaluation
processes. The system outlined in this report integrates cutting-
edge technologies such as Optical Character Recognition
(OCR) and Natural Language Processing (NLP) to
revolutionize the grading paradigm. The comprehensive set of
functional requirements, usability enhancements, and non-
functional considerations collectively shape a robust
framework for an efficient, accurate, and user-friendly solution.
The system's key functionalities, including user authentication,
answer script submission, OCR processing, NLP analysis, non-
textual element recognition, grading interface, feedback
Fig 2. Workflow Diagram mechanism, and data storage, collectively ensure a holistic
approach to automated evaluation. By implementing role-based
5. RESULTS access control and real-time feedback mechanisms, the system
To orchestrate this sophisticated system, a methodology was not only streamlines the evaluation process but also contributes
devised to modernize the assessment process of educational to improved educational outcomes and personalized learning
institutions.” A dynamic website with strong login paths.
authentication is built to upload and view answer scripts using The emphasis on non-functional requirements, including
the above steps. The site also has an intuitive interface that can performance, scalability, usability, maintainability, and
be easily explored, the treasure trove of digitized scripts can be compatibility, underscores the commitment to delivering a
accessed by student ID, department, semester exam and solution that meets the highest standards of efficiency,
subject. reliability, and adaptability. The software requirements,
Upon submission, the answer scripts are subjected to an OCR centered around web hosting, NLP modules, and a secure
process that converts the handwriting or typed data into database, along with specific hardware prerequisites, form the
computer-readable text form. But this is not a simple backbone of a technology stack designed to handle the
mechanical conversion of text — it’s really the beginning of complexities of large- scale assessment processes.
exactly what is often need: a place where key constructs from
each response are built, recorded and available for analysis.
7. REFERENCES
[1] A. Rokade, B. Patil, S. Rajani, S. Revandkar, and R.
The extracted text is carefully recorded in a secured database Shedge, "Automated Grading System Using Natural
for further evaluation and feedback from levels of human Language Processing," in 2018 Second International
control later on. But the magic of the system is in its NLP Conference on Inventive Communication and
capabilities, where algorithms have been trained to hone in on Computational Technologies (ICICCT), Coimbatore,
the language and actually read though those answers. These India, 2018, pp. 1123-1127, doi:
algorithms are very good at picking up on semantic nuances, 10.1109/ICICCT.2018.8473170.
peeling back the layers of complexity and checking responses
for coherence and consistency along many dimensions. With [2] V.S. Sadanand, K.R. Guruvyas, P.P. Patil, J. Janardhan
this linguistic expertise, they construct a bespoke scoring Acharya, and S. Gunakimath Suryakanth, "An automated
framework that ensures the questions are assessed fairly and essay evaluation system using natural language processing
thoughtfully. Moreover, even the evaluation is generated, the and sentiment analysis," International Journal of Electrical
response is looked from cosine similarity point of view to be and Computer Engineering (IJECE), 2022.
exactly matched with some ground truth in training corpus. [3] V. Kumari, P. Godbole, and Y. Sharma, "Automatic
This analysis is the objective foundation for awarding marks in Subjective Answer Evaluation," 2023, doi:
a manner that guarantees fairness and alleviates any teacher 10.5220/0011656000003411.
bias in grading. Meanwhile, for answers adorned with diagrams
and visual representations, a cutting-edge deep learning model [4] A.K.R. Maya, J. Nazura, and B.L. Muralidhara, "Recent
26
International Journal of Computer Applications (0975 – 8887)
Volume 186 – No.42, September 2024
IJCATM : www.ijcaonline.org 27