Phase 2 Final
Phase 2 Final
A PROJECT REPORT
Submitted by
RAGHUL P (190701155)
of
BACHELOR OF ENGINEERING
IN
APRIL 2023
RAJALAKSHMI ENGINEERING COLLEGE
CHENNAI
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
ABSTRACT
Automating the evaluation of descriptive answers would be beneficial for academic institutions to
efficiently manage the online exam results of their students. Our project involves designing an
algorithm to automatically evaluate descriptive answers consisting of multiple sentences. Our
approach involves representing the student's answer and comparing it with pre-defined answers
created by the staff. To evaluate the answer, we use a pattern-matching algorithm and various
modules to achieve efficient evaluation without manual labor. This pattern can be used by many
organizations to reduce manpower and save time. Natural Language Processing (NLP) aims to
interpret human language in a meaningful way and typically involves machine learning techniques.
Evaluating the objective function involves assessing candidate solutions against a portion of the
training dataset, usually measured by an error score or loss. While the objective function is easy to
define, evaluating it can be costly.
iv
ACKNOWLEDGEMENT
Initially we thank the Almighty for being with us through every walk of our life and
showering his blessings through the endeavor to put forth this report. Our sincere thanks to
our Chairman Mr. S.MEGANATHAN, B.E, F.I.E., our Vice Chairman Mr. ABHAY
SHANKAR MEGANATHAN, B.E., M.S., and our respected Chairperson Dr. (Mrs.)
THANGAM MEGANATHAN, Ph.D., for providing us with the requisite infrastructure
and sincere endeavoring in educating us in their premier institution.
Our sincere thanks to Dr. S.N. MURUGESAN, M.E., Ph.D., our beloved Principal for his
kind support and facilities provided to complete our work in time. We express our sincere
thanks to Dr. P.KUMAR,Ph.D., Professor and Head of the Department of Computer
Science and Engineering for his guidance and encouragement throughout the project work.
We convey our sincere and deepest gratitude to our internal guide,”Mrs.SUSMITA
MISHRA”,from the Department of Computer Science and Engineering. Rajalakshmi
Engineering College for her valuable guidance throughout the course of the project. We are
very glad to thank our Project Coordinator, Dr. N. Srinivasan Department of Computer
Science and Engineering for his useful tips during our review to build our project.
PRAVEEN KUMAR S
RAGHUL P
v
TABLE OF CONTENTS
ABSTRACT iii
LIST OF FIGURES ix
1. INTRODUCTION 1
1.1 DATA SCIENCE 1
1.2 DATA SCIENTIST 2
1.2.1 REQUIRED SKILLS FOR
DATA SCIENTIST 2
1.3 ARTIFICIAL INTELLIGIENCE 2
1.4 NATURAL LANGUAGE PROCESSING 2
1.5 MACHINE LEARNING 3
2. LITERATURE REVIEW 4
2.1 EXISTING SYSTEM 14
2.1.1 DRAW BACKS OF EXISTING
SYSTEM 15
2.2 PROPOSED SYSTEM 15
vi
3. SYSTEM DESIGN 16
3.1 GENERAL 16
3.2 SYSTEM REQUIREMENTS 16
3.2.1 FUNCTIONAL REQUIREMENTS 17
3.2.2 NON-FUNCTIONAL REQUIREMENTS 17
3.2.3 ENVIRONMENTAL REQUIREMENTS 18
3.3 WORKING PROGRESS 18
3.4 DESIGN OF THE ENTIRE SYSTEM 19
3.4.1 SYSTEM FLOW DIAGRAM 19
3.4.2 ARCHITECTURE DIAGRAM 20
3.4.3 USE CASE DIAGRAM 21
3.4.4 ACTIVITY DIAGRAM 22
4. PROJECT DESCRIPTION 23
4.1 METHODOLOGIES 23
4.1.1 MODULES 23
4.2 MODULES DESCRIPTION 23
4.2.1 DATA PRE-PROCESSING 23
4.2.2 DATA VISUALIZATION 24
4.2.3 ALGORITHM IMPLEMENTATION 25
4.2.3.1 DECISION TREE
CLASSIFIER 25
vii
4.2.3.2 RANDOM FOREST
CLASSIFIER 26
4.2.3.3 ADA BOOST CLASSIFIER 27
4.2.4 FLASK 27
APPENDICES 38
REFERENCES 54
viii
LIST OF TABLES
ssss
TABLE NO. TITLE PAGE NO.
LIST OF FIGURES
CHAPTER 1
INTRODUCTION
The evaluation of student answers using computer-based methods has become a common practice
in various areas of the education system. The integration of computers in learning has revolutionized
the field of education. The computer-assisted assessment system was initially developed for
evaluating single-word answers, such as those found in multiple-choice questions, but it can also
assess paragraph answers using keyword matching. This system is highly useful in academic
institutions for checking answer sheets and can also be implemented in organizations conducting
competitive exams. The system works by taking a scanned copy of the answer sheet as input and
extracting the text of the answer after preprocessing. Model answer sets, along with keywords and
question-specific criteria, are provided by the evaluator and are used to train the system. The system
evaluates the student's answer based on three parameters: keywords, grammar, and question-specific
criteria.
1.1 DATA SCIENCE
Data science is an interdisciplinary field that involves the use of scientific methods, algorithms, and
systems to extract valuable insights and knowledge from both structured and unstructured data. The
practical applications of data science span a wide range of domains. The term "data science" was
first suggested as a possible alternative to computer science in 1974 by Peter Naur. However, it was
not until 1996 that data science was specifically featured as a topic at the International Federation
of Classification Societies conference. In 2008, D.J. Patil and Jeff Hammerbacher, who were then
leading the data and analytics efforts at LinkedIn and Facebook, respectively, coined the term "data
science." Today, data science is one of the most popular and highly sought-after professions. It
requires a combination of skills, including domain expertise, programming skills, mathematics and
statistics knowledge, and machine learning techniques to extract meaningful insights and patterns
from data that can be used in making critical business decisions.
2
Data scientists possess the necessary skills to identify relevant questions and locate data sources to
provide answers. In addition to their analytical abilities, they also possess business acumen and can
effectively extract, refine, and present data. Many businesses employ data scientists to manage,
analyze, and organize large quantities of unstructured data.
Artificial intelligence (AI) is the simulation of human intelligence in machines that are designed to
think and act like humans. It involves creating intelligent machines that can perform tasks that
typically require human intelligence, such as learning, reasoning, problem-solving, perception, and
decision-making. AI is also applied to machines that exhibit human-like traits, such as learning and
problem-solving. AI applications include advanced web search engines, recommendation systems,
speech recognition, self-driving cars, and strategic game systems. As machines become more
advanced, tasks once considered "intelligent" are often removed from the AI definition, which is
known as the AI effect. Optical character recognition is an example of a technology that is often
excluded from AI due to its routine nature.
Natural language processing (NLP) is a field of artificial intelligence that focuses on making
machines capable of understanding and interpreting human language. An advanced NLP system
could enable human-like interactions with computers and allow them to learn directly from human-
3
written sources like news articles. Some practical applications of NLP include text mining,
information retrieval, machine translation, and question answering. Traditional approaches to NLP
involve analyzing word frequency and co-occurrence patterns to build syntactic representations of
text. However, this approach has limitations, as it may miss relevant information or fail to capture
the meaning of words in context. More modern statistical approaches to NLP use a combination of
strategies, such as keyword spotting and lexical affinity, to achieve higher accuracy levels. The
ultimate goal of NLP is to create machines that possess common sense reasoning abilities, and recent
advancements in deep learning have brought us closer to that goal. As of 2019, transformer-based
deep learning architectures are capable of generating coherent text.
Machine learning involves using past data to predict future outcomes. It is a subset of artificial
intelligence that allows computers to learn and improve their performance without being explicitly
programmed. The main goal of machine learning is to develop computer programs that can adapt
and learn from new data. The process of training and prediction involves using specialized
algorithms to feed training data to a model, which can then make predictions on new test data. There
are three main categories of machine learning: supervised learning, unsupervised learning, and
reinforcement learning. In supervised learning, both the input data and corresponding labels are
provided to the model, whereas in unsupervised learning, there are no labels and the model must
figure out the patterns in the data. Reinforcement learning involves the model dynamically
interacting with its environment and receiving feedback to improve its performance over time.
Specialized algorithms are used to implement machine learning, and Python is a popular language
for this purpose.
CHAPTER 2
LITERATURE SURVEY
The primary goal of education is to impart knowledge and skills to students in a specific subject or
field. However, the ultimate objective is for students to be able to apply this knowledge practically.
To achieve this, it is important to determine the extent to which students have absorbed the material
taught. This can be accomplished by evaluating their degree of learning through written or practical
examinations.
Objective questions are easier to evaluate using automated systems than descriptive answers.
However, assessing descriptive answers is a difficult and labor-intensive task. To address this issue,
an algorithm is proposed to automate the evaluation process of descriptive answers. The motivation
behind this automation is to expedite the evaluation process, reduce the need for manpower,
eliminate subjective biases, simplify record keeping and extraction, and ensure uniform evaluation
regardless of any mood swings or changes in perspective of the human assessor.
The examination process is crucial for evaluating the performance of students at various levels of
education, from primary to postgraduate. However, evaluating the answer booklets written by
students can be challenging due to the differences in handwriting styles, fonts, sizes, orientations,
5
6
and other factors. At the primary and high school levels, the question paper pattern typically includes
fill-in-the-blanks, matching, true/false, one-word answers, odd-man-out, and pick-out-the-odd-
word questions, which are answered in the booklets. The questions are printed, but the answers are
handwritten.
For technical subjects, manual evaluation by human evaluators is a difficult task, as it involves
assessing answers based on various parameters, including question-specific content and writing
style. Evaluating hundreds of answer scripts with similar answers can also become a tedious task
for evaluators, whose perception may vary from one another. To address these issues and expedite
the evaluation process, automation of answer script evaluation is necessary.
Manually evaluating answers is a time-consuming and tedious task that requires a lot of manpower,
and can result in unequal marks being given by the paper checker. Our system aims to automate
answer evaluation by utilizing keywords and saving manpower. The answer paper can be scanned,
and the system will provide marks to the question based on the keywords present in the answer,
using a dataset. This system will also reduce errors in marks given for a particular question.
Our application uses a machine learning algorithm that matches keywords from a dataset to
automatically evaluate answers. This is different from other applications available in the market that
only evaluate multiple-choice questions and not subjective questions. To use this application, the
answer to a particular question needs to be scanned, and the system will split the answer's keywords
using OCR technology. Based on the keywords in the answer and those in the dataset, the
application will provide marks ranging from 1 to 5.
The conventional method of evaluating subjective exams is problematic. This is because the quality
of the evaluation may differ based on the emotional state of the evaluator. To address this issue, our
proposed system employs machine learning NLP. The algorithm is designed to tokenize words and
sentences, perform part-of-speech tagging, chunking, lemmatizing words, and utilize Wordnet to
evaluate subjective answers. Additionally, our system provides the semantic meaning of the context.
The system comprises two modules. The first module extracts data from scanned images and
organizes it appropriately. The second module applies machine learning and NLP to the retrieved
text and assigns marks based on the analysis. This approach saves time and improves the accuracy
of the evaluation.
Organizations and educational institutes rely heavily on the examination grading system, which
mostly consists of objective questions. While these systems are beneficial in terms of resource-
saving, they fail to evaluate subjective questions. This research aims to evaluate descriptive answers
by comparing them graphically to standard answers. The proposed solution involves a subjective
answer verifier that assigns marks based on the accuracy percentage of the answer provided by
different users, with three different answers given.
To implement the system, a database containing questions, corresponding answers, and their
allocated marks is necessary. The system must verify the user's answers by comparing them with
the template answers and identifying the key elements of the responses using artificial intelligence
to assign marks.
7
In today's age of technological advancements, technology has become an essential need in the daily
lives of people, and the Internet of Things (IoT) is a system that provides objects and people with
unique identities and the ability to transfer data through a network without human interaction. The
IoT has a great potential for enhancing life by utilizing intelligent sensors and smart devices that
collaborate over the internet. Assessing a student's capability is usually done by evaluating the
answers they provide in an exam, which allows us to measure their learning ability. However, this
process is time-consuming, costly, and may not be entirely accurate. Therefore, an automated
system that evaluates the student's answers can provide more precise results. Unlike other answer
script evaluators, this system assesses handwritten scripts, which are more convenient and easily
accessible for the students. To use this system, the institution only needs to upload the answer key,
and the system will generate individual marks for each student by uploading their answer scripts
directly.
Numerous algorithms have been proposed for handwriting recognition and conversion, and it is not
possible to achieve full accuracy using a single technique for preprocessing. A multi-level
perceptron classifier is suggested for identifying Bangla digits by means of a 76-component feature
array. The evaluation of subjective answer checking is not a new concept, and various techniques
have been experimented with, such as Natural Language Processing, Latent Semantic Analysis,
Generalized Latent Semantic Analysis, Bayes Theorem, and K-Nearest Neighbor.
A sentiment classification model that removes stop words was discussed, and a model was
developed that takes short answers as input and constructs RDF sentences. The model considers
both the lexical structure and synonyms while matching with the model answer for one-sentence
answers. There is a limit on the length of the answer sentences. A semi-automated evaluation
technique was used to evaluate subjective papers, where a question base and an answer base were
8
created with model answers. The student answer is evaluated based on the semantic meaning and
sentence length.
There are multiple programs available for grading subjective answers, but these programs have
some issues that the creators of the "ASSESS" system aimed to address. The systems being
considered include an Automated Grading System, which emphasizes a knowledge-based approach
instead of just a keyword-matching algorithm. The system employs ontology to link domains related
to a given keyword, and LSA and dictionary mapping ensure that pertinent answers receive credit.
Although grammar and syntax are verified, they do not influence the overall score as long as the
concept is thoroughly explained. However, the primary limitation of this system is that it does not
provide any feedback or information to the student regarding their errors.
Another program employs various machine learning techniques such as Latent Semantic Analysis,
Generalized Latent Semantic Analysis, Maximum Entropy technique, and Bi Lingual Evaluation
Understudy to capture the latent relationship between words. This program measures the
relationship between words, words and concepts, and uses ontology for answer evaluation. The
techniques described in this paper demonstrate a strong correlation (up to 90%) with human
performance.
[9] Automated Descriptive Answer Evaluation System Using Machine Learning
Authors: Ms. Sharmeen J. Shaikh, Ms. Prerana S. Patil, Ms. Jagruti A.
Pardhe, Ms. Sayali V. Marathe, Ms. Sonal P. Patil
To create a software application that can assess descriptive answers and assign marks based on the
accuracy of the response, a user must first log in to the system for authentication. Once
authenticated, users can access the questions provided by the system. The proposed system evaluates
answers by comparing them with a standard answer stored in the database, which contains the
9
description, meaning, and keywords. The system then matches the keywords or key concepts, as
well as synonyms, with the answer and checks the grammar and spelling of the words. The answer
is then graded based on its accuracy, with the evaluation process consisting of three main steps:
extracting keywords and synonyms, matching the keywords, and weighting the keywords to
generate a score. The system assigns grades based on the number of keywords matched.
[10] Subjective Answers Evaluation Using Machine Learning and Natural Language
Processing
Authors: Muhammad Farruk Bashir,Hamza Abdul Rehman Javed, Natalia
Kryvinska, And Shahab S. Band
They utilized Chinese automatic segmentation techniques and subjective ontologies to produce a k-
dimensional LSI space matrix. The responses were represented as TF-IDF embedding matrices and
then underwent Singular Value Decomposition to construct a semantic space of vectors. LSI was
employed to mitigate issues with synonyms and polysemy. Ultimately, cosine similarity was used
to calculate the similarity between responses. The dataset consisted of 35 categories and 850
instances evaluated by teachers, and the findings indicated a 5% variation in grading between the
teacher evaluations and the proposed system.The system did not employ hyper-parameters and
utilized a relaxed WMD method to relax the constraints of the vector space. The dataset contained
eight real-world collections, such as Twitter sentiment data and BBC sports articles. The Word2vec
model from Google News was used, as well as two custom models that were trained. The testing
data was classified using the KNN approach. As a result, the relaxed WMD approach decreased
error rates and resulted in classification speeds 2 to 5 times faster.
Assessing a person's competence and skills is typically done through exams, which can either be
subjective or multiple choice in format. Objective exams are easier to grade automatically, saving
resources and effort. However, many exams are still subjective, and automated grading is currently
only available for objective exams. Finding a similar solution for subjective exams remains a
challenge. Administering exams is a tedious task for educational institutions, involving the
10
distribution of exam papers, students writing answers, and multiple levels of checking by examiners
and authorities responsible for processing the sheets.
The process of converting PDF files to text files through OCR is commonly used. In the education
setting, teachers may provide model answers in a notepad file and calculate the similarity level of
students' answers to the provided model. If the similarity level exceeds a certain threshold, students
receive full marks. However, this system may not always be reliable due to variations in word choice
or missing answer types, and there is no weighting system for keywords or points. As a result, the
accuracy of this system may not accurately reflect real-life evaluation systems.
In contrast, some teachers may evaluate students' answers based on a model answer provided by the
teacher. The weight assigned to factors such as answer length and grammar can vary depending on
the course, and the system of assigning weight is not commonly used by evaluators. Additionally,
this system does not take into account synonyms, which can lead to discrepancies when students
use different words to convey the same meaning.
The input for the process was an image dataset, which underwent a pre-processing step including
resizing and conversion to grayscale. Text was then extracted from the pre-processed image using
mean standard deviation and pytesseract, and stored in text format. Natural Language Processing
was applied to clean the extracted text, and the number of words and letters were calculated. Finally,
an Artificial Neural Network (ANN) deep learning algorithm was implemented and experimental
results were obtained, including performance metrics such as accuracy and evaluation based on the
number of words and letters.
11
The authors used a latent semantic categorization method to evaluate subjective queries online,
utilizing Chinese automatic segmentation techniques and subjective ontologies to construct a k-
dimensional LSI area matrix. Answers were represented in TF-IDF embedding matrices, and
Singular Value Decomposition (SVD) was applied to the term-document matrix to create a semantic
space of vectors. LSI helped to address issues with word ambiguity. Cosine similarity was used to
calculate the similarity between answers, with a dataset of 35 classes and 850 instances marked by
teachers. The results showed a 5% difference in grading between the teacher and the proposed
system.
Kusner et al. introduced the novel concept of using Word Mover's Distance (WMD) to measure
dissimilarity between two texts. The system utilized no hyper-parameters and a relaxed approach to
slow down vector space bounds. The dataset included eight real-world sets, including Twitter
sentiment data and BBC sports articles. The Word2vec model from Google News and two custom
models were used. A K-Nearest Neighbor (KNN) classification approach was employed to classify
the testing data. As a result, relaxed W.M.D. reduced error rates and led to two to five times faster
classification.
[15] Machine Learning based Automatic Answer Checker Imitating Human Wayof
Answer Checking
Authors: Vishwas Tanwar
In today's world, there are various ways to conduct exams, including online exams, OMR sheet
exams, and MCQ type exams. Examinations are conducted frequently around the world, and an
essential aspect of any exam is the evaluation of students' answer sheets. This task is usually
performed manually by the teacher, which can be very time-consuming, especially if the number of
students is large. In such cases, automating the answer checking process would be very useful.
Automation would not only reduce the workload of the teacher but also make the checking process
12
more transparent and fair by eliminating any chances of bias. Although there are several online tools
available for checking multiple-choice questions, there are only a few tools to evaluate subjective
answer type exams.
An automated answer checking system that grades written answers is similar to a human being. The
system requires users to create an account, which is available to administrative staff. The
administrator can add questions and their subjective solutions to the system, which are stored in
notepad files. When a user takes the test, they are provided with questions and a space to type their
answers. After the user submits their answers, the system compares them to the main answer stored
in the database and assigns grades accordingly, even if the responses are not identical. The system
incorporates Artificial Intelligence (AI) tools that evaluate answers and assign grades similarly to a
human grader. The designers used CNN, Image processing, and res-net image feature extractor to
create the system. An accuracy checking algorithm checks character error rates.
The system was designed for use with a device and scanner, along with software applications that
can evaluate MCQ examination tests with questions having four options, and students can only
select one answer per question. The software analyzes the question paper to identify the response to
each question by matching it with an accurate answer stored in the database. The program is user-
friendly and utilizes OpenCV to facilitate image processing.
By considering the issues that the Quality Assurance (QA) field deals with, we can establish
a fundamental structure for a system designed for open-domain question answering.
Typically, such a system comprises three main components. The first module is responsible
for extracting information that pertains to the question being asked. This involves identifying
13
the type of question, determining the expected answer format, and creating a basic query that
includes a range of relevant keywords, phrases, and entities, as well as information about the
syntactic and semantic relationships between the words in the question. This query can then
be enhanced with additional information, such as synonyms or translations, in the case of
multi-lingual QA systems.
The functioning of this system involves attempting to identify potential answers by extracting the
relevant content from a predefined template or model answer that is provided within the framework
for question answering. The correctness of an answer is always subject to scrutiny, and as such, we
need to determine its level of confidence by comparing it to the model answer. During the evaluation
process, not every word in the answer carries equal importance, so it is necessary to evaluate the
answer as a single sentence. The system assigns scores based on the level of matching between
packages of words and keywords.
Subjective evaluation refers to the process of assessing responses to Descriptive, Define or Explain
type of questions, which are designed to measure a candidate's understanding of the concepts in a
particular subject. Our system has an efficiency rate of over 80% in evaluating one-word and one-
sentence answers. Paraphrasing is used to evaluate variations in vocabulary use when answering
single-sentence questions. While objective-based answering systems are common in online
education courses and are easy to evaluate, traditional university-level courses require subjective or
descriptive evaluations to test a student's conceptual understanding. Various online examination
systems are available in the market, such as web-based evaluation systems for computer education
and online annotation research and practices. Web-based educational technologies provide insight
into effective learning strategies and how students learn.
14
Handwritten recognition is a relatively narrow field of study in the domain of pattern recognition
and image processing, with a growing need for optical character recognition of handwritten scripts.
In this section, a thorough examination of existing research on handwritten recognition systems that
rely on various machine learning techniques is presented. While recognizing printed text has
become a well-established practice, recognizing handwritten text remains a challenging task due to
the significant variation in handwriting among individuals, including differences in letter or digit
size, orientation, thickness, format, and dimension. Several machine learning approaches have been
proposed for handwritten text recognition, such as the automated grading of handwritten answers
and the identification of handwritten alphabets and digits in different languages. This section
discusses different machine learning classifiers used in handwritten recognition methods.
Manual evaluation of subjective papers can be challenging and tedious due to the subjective nature
of the task. Analyzing subjective papers using AI presents its own set of challenges, such as
inadequate understanding and acceptance of data. While previous attempts have used traditional
counting methods and specific words to score student answers, there is a shortage of curated datasets
for this purpose. To address this, a new approach is proposed in this paper that employs various
machine learning and natural language processing techniques, including Word net, Word2vec, word
mover’s distance (WMD), cosine similarity, multinomial naive Bayes (MNB), and term frequency-
inverse document frequency (TF-IDF). This approach uses solution statements and keywords to
evaluate descriptive answers, and a machine learning model is trained to predict grades. The study
found that WMD outperformed cosine similarity, and with sufficient training, the machine learning
model could function as a standalone tool. The accuracy rate achieved was 88% without the MNB
model, which was further reduced by 1.3% with its inclusion.
15
Education is undergoing a significant transformation with the use of machine learning (ML), which
is revolutionizing teaching, learning, and research. ML is being employed by educators to identify
struggling students early and intervene to improve their success and retention rates. Researchers are
also utilizing ML to expedite their work and uncover new discoveries and insights. One proposed
approach involves creating an ML model for predicting Mark Evaluation. The project's initial step
is gathering past related data to build a dataset, which is then pre-processed to eliminate irrelevant
data. After the dataset is analyzed, it is prepared for training. Machine learning is most commonly
used in the Mark Evaluation domain to minimize human errors. Various algorithms are utilized to
train the model, and the most effective one is selected. The chosen algorithm is then saved as a
model file, which is utilized for making predictions.
CHAPTER 3
SYSTEM DESIGN
3.1 GENERAL
3) Usability: The system should have a user-friendly interface that is simple to navigate and
understand.
4) Accuracy: The system should be dependable and precise, providing accurate evaluations
based on established criteria.
5) Security: The system should be secure and safeguarded from potential threats, with the
necessary measures in place to prevent unauthorized access and protect user data.
6) Scalability: The system should be able to manage an increasing number of users and
submissions without affecting its performance or reliability.
Completing a project is an ideal way to start using Python for machine learning. By doing so,
you will be prompted to install and launch the Python interpreter, obtain an overview of how
to work through a small project, and gain confidence that you can embark on your own small
projects.
When working on a machine learning project with your own datasets, it may not be a linear
process, but it typically involves a number of well-known stages, such as defining the problem,
preparing the data, evaluating algorithms, improving results, and presenting the final outcome.
19
The student provides a written response to the questions on the staff's exam paper, which is then
compared to the answer key using specific modules. The result is then evaluated to determine if the
student has passed or failed, and this information is presented to the evaluator.
The diagram of the system architecture illustrates the process of how data is collected and
transformed into the final output for a given input. It starts with the questions and answers dataset
and eventually generates results indicating whether the input has passed or failed using various
modules, trained datasets, and predictions.
21
Use case diagrams are utilized to perform high-level requirement analysis of a system. During the
analysis of the system's requirements, its functionalities are identified and documented in use cases,
which can be considered a systematic representation of the system's functionalities. In other words,
use cases serve as an organized presentation of the system's functionalities.
22
An activity in a system represents a specific operation or task. Activity diagrams are utilized not
only for visualizing the dynamic nature of a system but also for constructing an executable system
using forward and reverse engineering techniques. The only aspect that activity diagrams do not
depict is the message flow between activities. Therefore, activity diagrams are sometimes referred
to as flow charts, although they are not exactly the same. Activity diagrams depict various types of
flow, including parallel, branched, concurrent, and single, whereas flow charts do not necessarily
capture these types of flow.
23
CHAPTER 4
PROJECTDESCRIPTION
4.1 METHODOLOGIES
4.1.1 MODULES
➢ Data Pre-processing
➢ Data Analysis of Visualization
➢ Algorithm Implementation
• Decision tree Classifier
• Random forest Classifier
• Ada Boost Classifier
➢ Deployment
In machine learning, validation techniques are utilized to determine the error rate of the ML model,
which is considered to be close to the true error rate of the dataset. While a large enough dataset can
represent the population, in real-world scenarios, working with small samples may not accurately
represent the dataset's population. Validation techniques are used to detect missing or duplicate
values and to identify the data type of each variable. A sample dataset is used to provide an impartial
evaluation of a model's performance and to fine-tune its hyper parameters. However, as the
validation set's skill is incorporated into the model configuration, the evaluation may become more
biased. Machine learning engineers use this data for frequent evaluation and fine-tuning of the
model's hyper parameters. Data collection, analysis, and addressing the data's content, quality, and
structure can be a time-consuming process. It is essential to understand the data and its properties
during the data identification process, which helps in selecting the algorithm for building the model.
24
Data visualization is a crucial skill in applied statistics and machine learning. While statistics deals
with quantitative descriptions and estimations of data, data visualization provides a set of important
tools for understanding data qualitatively. It is useful for exploring and familiarizing oneself with a
dataset and can aid in identifying patterns, outliers, corrupt data, and more. With some domain
knowledge, data visualizations can express and demonstrate key relationships in plots and charts
that are more impactful to stakeholders than measures of association or significance. Data
visualization and exploratory data analysis are entire fields in themselves, and it is recommended to
delve deeper into some of the suggested books.
Visualizing data through charts and plots can help make sense of it, even when the data itself may
not be immediately understandable. The ability to quickly visualize data samples is a valuable skill
in both applied statistics and machine learning. This includes understanding different types of plots
that are useful for visualizing data in Python, such as line plots for time series data and bar charts
for categorical quantities. Additionally, histograms and box plots are useful for summarizing data
distributions.
Decision Tree is a technique for Supervised learning that can solve both Regression and
Classification problems, though it is primarily used for the latter. This classifier takes the form of a
tree structure, where the internal nodes of the tree correspond to features in the dataset, branches
represent decision rules, and each leaf node corresponds to an outcome. The Decision Node is a
node that makes decisions and has multiple branches, while the Leaf Node is the result of those
decisions and does not have any further branches. This approach uses the features of the dataset to
perform tests and make decisions. It is a visual representation of all possible solutions to a
problem/decision based on the given conditions. This technique is known as a decision tree because,
like a tree, it starts with a root node that expands into further branches to form a tree-like structure.
The Random Forest is a well-known algorithm in machine learning that is categorized under
supervised learning. It can effectively handle both classification and regression tasks in machine
learning. Its foundation lies in ensemble learning, which is a method of combining multiple
classifiers to solve complex problems and improve model performance.
Random Forest is a classifier comprising of numerous decision trees, each trained on a different
subset of the given dataset, and it aggregates their predictions to enhance the accuracy of the dataset.
Unlike relying on the output of a single decision tree, Random Forest considers the predictions of
every tree and decides the final output based on the majority vote of predictions. Moreover, having
a higher number of trees in the forest can lead to more accurate predictions and also helps to prevent
overfitting.
Ada-boost, also known as Adaptive Boosting, is an ensemble boosting classifier created by Yoav
Freund and Robert Schapire in 1996. The primary purpose of AdaBoost is to enhance the accuracy
of classifiers by combining multiple classifiers. It operates as an iterative ensemble method.
AdaBoost builds a strong classifier by merging several poorly performing classifiers, leading to a
highly accurate strong classifier. The fundamental idea behind AdaBoost is to set weights for
classifiers and train data samples iteratively, ensuring accurate predictions of unusual observations.
Any machine learning algorithm that accepts weights on the training set can be used as a base
classifier. Ada boost must fulfill two conditions to work efficiently. Firstly, the classifier should be
trained interactively on various weighed training examples. Secondly, in each iteration, it should
attempt to provide an excellent fit for these examples by minimizing training error.
Flask is a micro web framework that is created using the Python programming language. This
framework is categorized as a micro-framework because it does not require specific tools or libraries
to function. Unlike other frameworks, Flask does not come with pre-built features such as database
abstraction layer or form validation. However, Flask allows the use of extensions that can add
features to the application as if they were part of Flask itself. Flask is an excellent framework for
building REST APIs and has access to all of Python's powerful features since it is built on top of it.
Although Flask is primarily used for the backend, it uses a templating language called Jinja2 to
28
create HTML, XML, or other markup formats that are sent to the user via an HTTP request. Flask
has a modular and lightweight design that makes it easy to transform it into the web framework you
require by adding a few extensions without adding any extra weight. Additionally, the foundation
API is well-structured and coherent.
29
CHAPTER 5
RESULT AND DISCUSSION
The development of an automatic descriptive answer evaluator system has enabled the assessment
of descriptive answers provided by students. This system compares a student's answer with the
staff's answer to the same question, using a variety of natural language processing techniques to
determine the degree of similarity between the two answers.
The evaluation process involves the use of NLP techniques such as keyword matching and sentence
similarity to compare the student's answer with the staff's answer. The system then assigns a score
to the answer based on the degree of similarity between the two answers, and can also identify
whether the student's answer has been plagiarized.
The system has been tested on a dataset of descriptive answers provided by students, and the results
show that it is highly accurate in its assessments. The system has been tested on various question
types and topics, with consistent results across all tests.
One limitation of the system is that it relies on pre-defined staff answers, which may restrict its
ability to assess answers that differ significantly from the staff's answer. Additionally, the system
may not be able to evaluate the quality of an answer beyond its similarity to the staff's answer.
In summary, the automatic descriptive answer evaluator system has demonstrated great potential in
accurately assessing descriptive answers provided by students. This system has the potential to save
a significant amount of time and effort for educators who need to assess large volumes of student
answers.
5.1.1 ACCURACY
In evaluating the performance of the automatic descriptive answer evaluator system, accuracy is
used as a metric. This metric is particularly useful when all the answers carry the same level of
30
significance. The calculation of accuracy involves dividing the total number of correct evaluations
by the total number of evaluations made. By using accuracy, the system is able to determine how
effectively it can assess the students' answers. Nevertheless, it is worth mentioning that relying
solely on accuracy may not be enough to provide a comprehensive evaluation of the answers. Other
metrics may also be required for a more comprehensive assessment.
5.1.2 LOSS
The system calculates the loss to determine the gradients in relation to the model's parameters, which
are subsequently modified via backpropagation. This process is performed iteratively, with the
system being updated each time until there is no further improvement in the desired evaluation
metric. By reducing the loss, the system can enhance its capacity to correctly evaluate descriptive
answers given by students. Nonetheless, it is important to keep in mind that the selection of the loss
function can considerably affect the system's performance, and selecting an appropriate loss
function is a crucial step in developing an effective automatic descriptive answer evaluator system.
It can assess its performance during the training process by using a validation dataset. The validation
dataset is a subset of data that is not used for training but rather to evaluate the system's accuracy.
The system compares the predicted answers with the actual answers in the validation dataset to
calculate its validation accuracy. This metric is important in estimating the system's ability to
perform on new and unseen data. It helps to monitor the system's performance during training and
avoid overfitting. The use of a validation dataset and validation accuracy is crucial in ensuring that
the automatic descriptive answer evaluator system accurately assesses descriptive answers provided
by students.
The validation loss is a performance metric utilized to evaluate machine learning models on the
validation dataset. The validation dataset is a fraction of the entire dataset used to check the model's
31
performance. The validation loss metric is similar to the training loss and is computed by summing
up the errors for each example in the validation dataset.
Table 5.1 represents the algorithm used and their respective accuracy. Among these, Random Forest
Algorithm outperformed the others with an accuracy of 93.4647%.
32
5.2.6 FLASK
CHAPTER 6
6.1 CONCLUSION
The analytical process begins with data cleaning and processing, addressing missing values,
performing exploratory analysis, and eventually building and evaluating models. The aim is to find
the algorithm with the highest accuracy score on a public test set, which will be used in an
application to determine Mark Evaluation. While many educational institutions conduct online
exams, mostly in the form of MCQs, our project aims to evaluate descriptive answers as well. This
type of examination is useful for assessing a student's aptitude, but it cannot measure their
theoretical knowledge. The proposed system aims to calculate subjective answers based on
keywords, comparing them against the model answer, and allocating marks to the student
accordingly and it will provide the equivalent grade with respect to the answer.
A potential improvement would be to utilize cloud deployment for increased scalability and
accessibility. By migrating to the cloud, the system would be able to manage larger amounts of data
and multiple users simultaneously, which would expand its potential applications for educators in
diverse environments. Moreover, the system could be optimized to work seamlessly with the
Internet of Things (IoT) system, which would provide additional flexibility and ease of use.
Integration with IoT would allow the system to be accessed from a wider range of devices and
locations, thereby increasing its impact on the education sector. These upgrades would significantly
augment the capabilities of the automatic descriptive answer evaluator system, making it even more
valuable for both educators and students.
38
APPENDICES
APPENDIX I
SAMPLE CODE:
DATA PRE-PROCESSING:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
data = pd.read_csv('Mark.csv')
data
Before removing the null data
data.shape
After removing the null data
df = data.dropna()
df.shape
df.isnull().sum()
df.info()
df.columns
df.duplicated()
df.duplicated().sum()
df.ANSWER.unique()
df.MARK.unique()
df.KEYWORD.value_counts()
df.columns
Before LabelEncoder
df.head()
from sklearn.preprocessing import LabelEncoder
var_mod = ['MARK','ANSWER','KEYWORD']
le = LabelEncoder()
for i in var_mod:
df[i] = le.fit_transform(df[i]).astype(int)
39
df.head()
df.corr()
df.describe()
DATA VISUALIZATION:
#import library packages
import pandas as p
import matplotlib.pyplot as plt
import seaborn as s
import numpy as n
import warnings
warnings.filterwarnings("ignore")
#Load given dataset
data = p.read_csv('MARK.csv')
df=data.dropna()
df
df.columns
df.groupby('ANSWER').describe()
print(vocab)
41
warnings.filterwarnings("ignore")
#Load given dataset
df = pd.read_csv('Mark.csv')
df
df.shape
type(df['KEYWORD'].loc[100])
df.info()
# Data cleaning and preprocessing
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
ps=PorterStemmer()
corpus=[]
for i in range(0, len(df)):
review0=re.sub('[^a-zA-Z0-9]',' ', str(df['KEYWORD'][i]))
review1=re.sub('[^a-zA-Z0-9]',' ', str(df['ANSWER'][i]))
review = review0+review1
review=review.lower()
review=review.split()
X=tv.fit_transform(corpus).toarray()
X
X.shape
df['MARK'].value_counts()
y=pd.get_dummies(df['MARK']) y=y.iloc[:,1].values
y = df['MARK']
print(y)
# Train Test Split
df = pd.read_csv('Mark.csv')
df
df.shape
type(df['KEYWORD'].loc[100])
df.info()
# Data cleaning and preprocessing
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
ps=PorterStemmer()
corpus=[]
for i in range(0, len(df)):
review0=re.sub('[^a-zA-Z0-9]',' ', str(df['KEYWORD'][i]))
review1=re.sub('[^a-zA-Z0-9]',' ', str(df['ANSWER'][i]))
review = review0+review1
review=review.lower()
review=review.split()
X.shape
df['MARK'].value_counts()
y=pd.get_dummies(df['MARK']) y=y.iloc[:,1].values
y = df['MARK']
print(y)
# Train Test Split
import re
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
ps=PorterStemmer()
corpus=[]
for i in range(0, len(df)):
review0=re.sub('[^a-zA-Z0-9]',' ', str(df['KEYWORD'][i]))
review1=re.sub('[^a-zA-Z0-9]',' ', str(df['ANSWER'][i]))
review = review0+review1
review=review.lower()
review=review.split()
X
X.shape
df['MARK'].value_counts()
y=pd.get_dummies(df['MARK']) y=y.iloc[:,1].values
y = df['MARK']
print(y)
# Train Test Split
FLASK:
from flask import Flask,render_template,url_for,request
import pandas as pd
import joblib
app = Flask(__name__)
48
@app.route('/')
def home():
return render_template('home.html')
@app.route('/predict',methods=['POST'])
def predict():
if request.method == 'POST':
message = request.form['message']
data = [message]
print(data)
vect = cv.transform(data).toarray()
my_prediction = clf.predict(vect)
print(my_prediction)
return render_template('result.html',prediction = my_prediction)
if __name__ == '__main__':
app.run(debug=False, port=7000)
49
APPENDIX II
PAPER PUBLISHED AND CERTIFICATE
Have submitted our paper to ViTECoN – 2023 on March 23rd in Vellore Institute of Technology
(VIT) ,Vellore, India.
50
APPENDIX III
CO-PO-PSO MAPPING
PO5: Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex engineering
activities with an understanding of the limitations.
PO6: The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.
PO7: Environment and sustainability: Understand the impact of the professional
engineering solutions in societal and environmental contexts, and demonstrate the knowledge
of, and need for sustainable development.
PO8: Ethics: Apply ethical principles and commit to professional ethics and responsibilities
and norms of the engineering practice.
PO9: Individual and teamwork: Function effectively as an individual, and as a
member or leader in diverse teams, and in multidisciplinary settings.
PO10: Communication: Communicate effectively on complex engineeringactivities with the
engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation,make effective presentations, and give and receive
clear instructions.
PO11: Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
PO12: Life-long learning: Recognize the need for and have the preparationand ability to
engage in independent and life-long learning in the broadest context of technological change.
52
PSO1: Foundation Skills: Ability to understand, analyze and develop computer programs in
the areas related to algorithms, system software, web design, machine learning, data analytics,
and networking for efficient design of computer-based systems of varying complexity.
Familiarity and practical competence with a broad range of programming languages and open-
source platforms.
REFERENCES
1. Nikam, P., Shinde, M., Mahajan, R., & Kadam, S. (2015). “Automatic Evaluation of
Descriptive Answer Using Pattern Matching Algorithm”. International Journal of Computer
Sciences and Engineering International Journal of Computer Sciences and Engineering
International Journal of Computer Sciences and Engineering International Journal of
Computer Sciences.
2. Ravikumar, M., Sampath Kumar, S., & Shivakumar, G. (2021). “Automation of Answer
Scripts Evaluation-A Review”
3. Sinha, P., & Kaul, A. (2018). “Answer Evaluation Using Machine Learning Answer
Evaluation Using Machine Learning” View project Answer Evaluation Using Machine
Learning.
4. Patil, P., Patil, S., Miniyar, V., & Bandal, A. (2018). “Subjective Answer Evaluation Using
Machine Learning”. International Journal of Pure and Applied Mathematics, 118(24).
5. Jagadamba, G., & Chaya Shree, G. (2020).” Online subjective answer verifying system using
artificial intelligence”. Proceedings of the 4th International Conference on IoT in Social,
Mobile, Analytics and Cloud, ISMAC 2020, 1023–1027.
6. Rao, M. V., Harshitha, I. S., Sukruthi, Y., & Sudharshan, T. (2020). “Automatic Answer
Script Evaluator”. INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY
RESEARCH, 9, 1.
7. Salomi Victoria, R. D., & Grace Vinitha, V. P. (2020). “Intelligent Short Answer Assessment
using Machine Learning”. International Journal of Engineering and Advanced Technology.
55
8. Johri, E., Chandak, P., Dedhia, N., Adhikari, H., & Bohra, K. (n.d.). “ASSESS-Automated
subjective answer evaluation using Semantic Learning”.
Shaikh, S. J., Patil, P. S., Pardhe, J. A., Marathe, S. v, & Patil, S. P. (n.d.).
10. Dhokrat, A., Hanumant R, G., & Mahender, C. N. (2012). “Assessment of Answers: Online
Subjective Examination”.
11. Bashir, M. F., Arshad, H., Javed, A. R., Kryvinska, N., & Band, S. S. (2021). “Subjective
Answers Evaluation Using Machine Learning and Natural Language Processing.” IEEE
Access, 9, 158972–158983.
12. Chakravarty, N., & Maggu, S. (2021). This work is licensed under a Creative Commons
Attribution 4.0 International License “SUBJECTIVE ANSWER-CHECKER”. International
Advanced Research Journal in Science, Engineering and Technology, 8(11).
13. Singh, S., Shah, Y., Vajani, Y., & Dholay, S. (n.d.). “Automated Paper Evaluation System
for Subjective Handwritten Answers”.
14. Associate Professor, S., Kumar, P., & Pramod, N. (n.d.). “AUTOMATIC ANSWER
EVALUATION USING MACHINE LEARNING” MANDADA SAMEMI 2 , TIRUMALA SAI
HAREESHA 3 , GUDLURU VENKATA SIVA SAI.
56
15. Prof. Sumedha P Raut, Siddhesh D Chaudhari, Varun B Waghole, Pruthviraj U Jadhav, &
Abhishek B Saste. (2022). “Automatic Evaluation of Descriptive Answers Using NLP and
Machine Learning”. International Journal of Advanced Research in Science,
Communication and Technology, 735–745.
16. Ivor Uhliaqp Ñrik, “Handwritten Character Recognition Using Machine Learning Methods”,
Comenius University In Bratislava Faculty Of Mathematics, Physics And Informatics Thesis
17. S. Mori, C.Y. Suen and K. Kamamoto, “Historical review of OCR research and
development,” Proc. of IEEE, vol. 80, pp. 1029-1058, July 1992.
18. Piyush Patil, Sachin Patil, Vaibhav Miniyar, Amol Bandal, ”Subjective Answer Evaluation
Using Machine Learning”, International Journal of Pure and Applied Mathematics, Pp.01-
13, 2018.
19. Pranali Nikam, Mayuri Shinde, Rajashree Mahajan and Shashikala Kadam, “Automatic
Evaluation of Descriptive Answer Using Pattern Matching Algorithm”, International Journal
of Computer Sciences and Engineering, Pp.69-70, 2015.
20. Nilima Sandip Gite ”Implementation of Descriptive Examination and Assessment System”,
international journal of advance research in science and engineering, volume no.07, Pp.252-
257, 2018.