0% found this document useful (0 votes)
56 views23 pages

Major Project Report Sem 7

The document describes a project report submitted by three students towards completion of their major project for a Bachelor of Technology degree in Computer Science and Engineering. The project aims to develop a student academic prediction app using linear regression to predict student marks based on academic and personal data like psychological factors. The report includes an abstract, table of contents, introduction on the need for the project, review of previous works on student performance prediction using machine learning, proposed methodology for data collection, preprocessing, feature extraction and model construction.

Uploaded by

Rudra Priyanka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views23 pages

Major Project Report Sem 7

The document describes a project report submitted by three students towards completion of their major project for a Bachelor of Technology degree in Computer Science and Engineering. The project aims to develop a student academic prediction app using linear regression to predict student marks based on academic and personal data like psychological factors. The report includes an abstract, table of contents, introduction on the need for the project, review of previous works on student performance prediction using machine learning, proposed methodology for data collection, preprocessing, feature extraction and model construction.

Uploaded by

Rudra Priyanka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

STUDENT ACADEMIC PREDICTION

A PROJECT REPORT

Submitted by

Saikat Chowdhury 1901227319


Rudra Shankar 1901227316
Sandesh Kumar 1901227321

Towards completion of major project


(7th semester)
of

BACHELOR OF TECHNOLOGY
IN

COMPUTER SCIENCE & ENGINEERING

Department of Computer Science & Engineering

C.V RAMAN GLOBAL UNIVERSITY


BHUBANESWAR- ODISHA - 752054

September 2022
C.V RAMAN GLOBAL UNIVERSITY
BHUBANESWAR-ODISHA-752054

CERTIFICATE OF APPROVAL

This is to certify that we have examined the project entitled "Student Academic
Prediction App" submitted by Saikat Chowdhury, Registration No.-1901227319,
Rudra Shankar, Registration No.- 1901227316, Sandesh Kumar, Registration No.-
1910227321, CGU-Odisha, Bhubaneswar. We here by accord our approval of it as a
major project work carried out and presented in a manner required for its acceptance
towards completion of major project stage-I (7th Semester) of Bachelor Degree of
Computer Science & Engineering for which it has been submitted. This approval
does not necessarily endorse or accept every statement made, opinion expressed or
conclusions drawn as recorded in this major project, it only signifies the acceptance of
the major project for the purpose it has been submitted.

Project Guide Internal Examiner


ACKNOWLEDGEMENT
We would like to articulate our deep gratitude to our project guide Prof. Soumya
Sahoo, Professor, Department of Computer Science & Engineering, who has always
been source of motivation and firm support for carrying out the project.
We would also like to convey our sincerest gratitude and indebtedness to all other
faculty members and staff of Department of Computer Science & Engineering, who
bestowed their great effort and guidance at appropriate times without it would have been
very difficult on our project work.
An assemblage of this nature could never have been attempted with our reference to and
inspiration from the works of others whose details are mentioned in references section.
We acknowledge our indebtedness to all of them. Further, we would like to express our
feeling towards our parents and God who directly or indirectly encouraged and
motivated us during Assertion.
ABSTRACT

Every educational organization aims at providing a good and fruitful knowledge to the
students. Many educational institutions are investing more on the education mining
for predicting the student academic performance considering their previous marks
but due to immense growth in recent technologies students are distracting more
towards the social media and due to the current scenario of covid student faced
mental health issues which affects their academics performance. So, our projects
focused on considering the students academic data along with their personal and
phycological data to provide more efficiency in predicting their performance using
Linear regression algorithm.
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

ABSTRACT iii
LIST OF FIGURES vi

1. INTRODUCTION 1

2. PREVIOUS WORK 1-2

3. METHODOLOGY 3-7

3.1. DATA COLLECTION

3.2. DATA PREPROCESSING

3.3. FEATURE EXTRACTION

3.4. MODEL CONSTRUCTION

4. SOURCE CODE 8

5. RESULT AND DISCUSSION 8-10

6. CONCLUSION AND FUTURE WORK 11-13

7. REFERENCES 14-15
LIST OF FIGURES

• Fig1. Distribution of final grade of students

• Fig2. Dataset of the student

• Fig3. Proposed system for student academic performance prediction

• Fig4. Self-assessment topics

• Fig5. Source Code

• Fig6. Workflow of the model

• Fig7. Graph between the predicted output and test output

• Fig8. User Interface

• Fig9. How marks is affected by attributes


1. INTRODUCTION

Quality of education is mandatory for the development of a country. The amount of data
in the domain of education is increasing everyday with emerging e-learning technologies.
Due to large amount of academic data of students most of the data remain unused. Data
mining is efficient for finding out the useful information from huge sets of data using
knowledge discovery in database. It is used in multiple domains including medical,
banking and educational purposes also called educational data mining. This app focusses
on the unidentified data which predicts the student academic outcome. The stakeholders
belong to this domain wants an early warning system to prognosticate literacy on early
stages.

To improve the quality of educational processes is the biggest challenge in today


scenario. Teachers can also track their students and identify the main cause of their
academic performance and helps in improving their academic quality. So, the prediction
result might help students to develop a good understanding on the current scenario of the
respective subjects and they can plan their study patterns accordingly. There are many
positive impacts on students and educational institutions such as increasing rank, better
job opportunities and increasing intuitional reputation using this app.

To analyze the data, well known classification algorithm such Artificial Neural
Network (ANN), Decision-tree, Regression Analysis, K-Nearest Neighbor (KNN) are
used for prediction purpose. Uniqueness of the model is defined by its ability to combine
the phycological and personal data with the academic data. The objective of the model is
to achieve the highest possible accuracy in academic performance which will predict the
percentage marks of the student.

2. Previous Work

Dorina etal. (1) proposed a prophetic model for pupil’s performance by classifying
scholars into double class (successful/ unprofitable). The proposed model was con-
structed under the CRISP-DM (Cross Industry Standard Process for Data Mining) probing
approach. The bracket algorithms (OneR, J48, MLP and IBK) were ap-plied on the given
dataset. The results show that the loftiest delicacy was achieved by the MPL model
(73.59) for identification of successful while other three models per- form more for the
identification of unprofitable scholars. The model was unfit to work out for data high
dimensionality and class balancing problems. Edin Osman begovicetal. (2) builds a model
to prognosticate pupil academic success in a course by reducing data dimensionality
problem. Colorful machine learning classifiers similar as NB, MLP and j48 were
estimated in this study. The result shows that the Naïve Bayes gained the loftiest delicacy
69.65. Class imbalance problem cannot be handled by this model.

Carlos et al. [3] to address the class imbalance and data dimensionality issues, a
student failure prediction model based on machine learning techniques was developed. On
the dataset, ten classifiers were used. The ICRM classifier achieved the highest accuracy
92.7% among others. Due to diversity of student’s characteristics at each institutional
level, the performance of model was not tested for each and every levels of education.
Another Educational Data Mining Challenge is to predict the dropouts of the students
from their respective courses [4]. The result shows that the support vector machine model
with the combination of the predictor variables was more accurate while classifying the
data. The inclusion of an attribute, earned grades of pre-requisite courses, in the data set
was a constraint of this study because it's feasible that a student's understanding of pre-
requisites increased during the course of study. Ajay et al. [5] did research on student
performance prediction. The study's key contribution was to create a new social element
termed "CAT," which details how Indians were classified into four sorts of groups in the
past based on their social standing and other factors, all of which have a direct impact on
student education. The results indicated that the IBI model was the highest accuracy
(82%) achieved. Create a better version of the ID3 model [6], which predicts student
academic success. The ID3 model's flaw was its intention to choose the attributes with the
most values as a node. As a result, the tree that was created was inefficient. The proposed
model overcomes such problem. This model generated two output classes (Pass and
Fail).The classifiers including J48, wID3 and Naïve Bayes were applied and results
compared. The wID3 achieved high accuracy 93%. Alaa Khalaf et al. [7] proposes a
model which predict student success performance in different courses. This research used
three Decision Tree classifiers: J48, Hoeding tree, and Reptree. Reptree achieved the
maximum accuracy of 91 percent. The model was unable to solve difficulties with large
dimensionality data and class balancing. Dech Thammasiri et al. [8] suggested a
methodology for early detection of freshmen's poor academic performance. To overcome
the problem of class imbalance, four classification methods and three balancing methods
were used. In results the combination of support vector machine and SMOTE achieved the
90.24% highest overall accuracy. Based on their learning portfolio data, an early warning
system was presented to anticipate student learning performances during an online course.
[9]. The results revealed that techniques accompanied by time dependent factors were
more accurate than those that did not. Offline mode was not used to test this model. In
offline mode, performance could be hampered by time-dependent properties. Previously, it
was considered that data mining algorithms performed effectively only with huge data
sets, but this study shown that data mining is also appropriate for small datasets [10]. This
study offered a model for predicting student achievement. A small dataset including
student academic data was used by using three decision tree approaches (Reptree, J48,
M5P). The result claims that the Reptree obtained the highest accuracy above 90% among
them. This model not support the
data high dimensionality and class balancing problems.

3. Methodology

So the common issues raised on the above literature review such as class imbalance, data
complexion and classification error. This app has proposed a model which have following
phases. Fig(1). Shows the main steps of the proposed model.

3.1 Data Collection

The Student data set used in this model is collected from Kaggle[10]. This is a
dataset from the University of California, Irvine's dataset repository. This dataset
comprises students' final results at the end of a math curriculum, together with many
features that may or may not influence the students' future outcomes.

Fig1. Distribution of final grade of students


Fig2. Dataset of the student

3.2 Data preprocessing

In data mining, pre-processing is crucial. Its goal is to convert raw data into a
format that mining algorithms can understand. During this phase, the following tasks are
completed.

Data Integration

Data integration is the process of combining data from several sources into a single
repository. When it comes to integrating data, redundancy is a regular issue. The dataset
consist of attributes which have redundant values such as school, age.

Data Cleaning

Missing and noisy data are dealt with in this step in order to ensure data
consistency. There are no missing data or outliers in the dataset used in this investigation.

Discretization

The discretization technique is used to convert numerical values into nominal


values in the desired data. Some classifiers are inapplicable to data that continues. In this
dataset the attributes whose values are changed to nominal values are sex, schoolsup,
familysub, paid, romantic, activities, nursery, higher, internet, family status ,teacher,
health, parents job, guardian.
Fig3. Proposed system for student academic performance prediction

3.4 Feature Extraction

Many attributes in the student performance dataset could be inappropriate for


classification purposes. When huge amounts of student variables that can influence
student performance, such as educational background, social, demography, family,
socioeconomic status, and so on, the problem of data high dimensionality develops. This
problem can be solved by picking out key features from the dataset.

The goal of feature selection is to choose a subset of features that can accurately
describe the input data while reducing the complexity of the feature space and eliminating
extraneous data. Wrapper-based and filter-based approaches are the two most common
types of feature selection methods. The filter method looks for the smallest number of
relevant features while ignoring the rest. It ranks the features using variable ranking
algorithms, with the highest rated features being picked and applied to the learning
process.
To evaluate the feature ranks, this study used a filter approach with an information gain-
based selection algorithm. It's determining which features are most relevant when creating
a performance model for kids. During feature selection, a rank value is assigned to each
feature according to their influence on data classification.

3.5 Model Construction

• In this model, first the student will register himself/herself or the college will
register the students

• The students will accept the terms and conditions which contains that students
have to put the real data and their information will be shared to a third party app
for more accurate prediction.

• On accepting the terms and conditions he or she will follow the self-assessment
test which includes questions from three types of domain.

• Study Related Questions

It includes the matriculation marks and higher education performance along


with backlogs, study hour, distance between the institution and home,
extracurricular activities and extra-educational support.

• Family Related Questions

It includes the education and financial qualifications of the pupil’s parents


with total number of siblings and educational support from the family side.

• Personal Questions

Now in this the student has to give their health related information
,relationship status , Social time and about the addictions , hobbies.

• Psychological Questions

According to, many mental health apps such as whatsup, happify,


Moodkit, etc. Many questions have been shortlisted to detect the mental
stability of the student.
• Storing Of Data

Data will be stored in our database and will forwarded into our machine
learning model for the processing of data.

• Use of Linear Regression

Linear regression is applied by taking grades as Y-axis and rest all the
attribute are linearly dependent on the grades. It helps in classifying the data.

• Confusion Matrix

Concept of confusion matrix is used to find the accuracy and recall of the
data.

• Final Predicted Result

Student will get the result and will be providing tips for improving their
academic result.

Fig4. Self-assessment topics


Fig.5 Workflow of the model
4. Source Code

Fig6. Source Code


5. Result and Discussion

4.1 Model Evaluation

For our experiments, we have used skLearn. In skLearn test_train_split to split our dataset
into 3:1 ratio in which 75% of the data is used to train our model and 25% of the data is
used to test the accuracy of our model. The process is iterated for five times to get the final
result which gives us 78% accuracy.

4.2 Evaluation measures

For the evaluation of categorization quality in our experiments, we use five


common distinct measures. Details are as follows:

CCI (Correctly Classified Instances): the number of instances that have been correctly classified
divided by the total number of instances. It's also known as accuracy.

Formula:(TP+TN)/(TP+FP+TN+FN).

• ICI (Erroneously Classified Instances): reflects the number of incorrectly


classified instances divided by the total instances

Formula: (FP+FN)/(TP+FP+TN+FN)

• Recall: the percentage of correctly identified occurrences divided by the total


number of instances (almost recall value be same as CCI).

Formula: Tprate=TP/(TP+FN)

• F-Measure: the recall and precision values are used to calculate the F-Measure
(double value of precision multiplied by recall divided by the value of summation of recall
and precision).
• Here we have used random forest algorithm and research about grey wolf
optimization
4.3 Grey Wolf Optimization

4.4 Result analysis

Fig4. It describes the relation between the predicted results and test results.
Fig7. Graph of comparisons of different algorithm

5
Fig8. User Interface
`

Fig9. How marks is affected by attributes


6. Conclusion

Machine learning algorithms can help teachers forecast student achievement sooner and
provide support for decision-making to increase student performance, as well as extracting
student performance criteria across many educational domains to develop a cohesive
taxonomy of student learning outcomes. Furthermore, machine learning algorithms can be
used to identify students who are likely to succeed academically and to discover students
who are at risk of failing, allowing students who are at risk of failing to receive additional
support early and on time. Machine learning algorithms are also used to investigate
student learning behaviour, solve student academic problems, optimise the educational
environment, and enable data-driven decision making, as well as to identify key factors
that influence student academic success in schools and investigate the relationships
between these key factors, and to find the best method for resampling and classifying
student learning outcomes datasets and predicting student learning outcomes.

Every educational institute nowadays need an accurate student academic


performance prediction model. However, resolving data quality issues in student
performance prediction models is sometimes the most difficult task. This app
presented a student performance prediction model based on supervised learning
technique linear regression.
The result proposed by the model has 78% accuracy.

In future, the proposed model will be tested on large datasets with more number of
attributes. Building a meta-analysis system on a larger dataset for future study, which can
be regarded a decision support approach based on the model that would achieve the best
efficiency and effectiveness, would be a logical continuation of this research.
Furthermore, using hybrid feature selection methods to predict student performance can
improve the study, making each feature more ideal and meaningful in terms of student
performance prediction. Extreme gradient boosting, an advanced ensemble-based machine
learning approach, could also be employed in this domain.
7. References

[1] Dorina Kababchieva. (2012). Student Performance Prediction using Data Mining
Classifi-cation Algorithms. International Journal of Computer Science and Management
Research, vol. 1.

[2] Edin Osmanbegovic and Mirza Suljic. (2012). Data mining approach for predicting
student performance. Journal of Economics and Business, vol. X, Issue 1.

[3] Carlos Marques-Vera and Alberto Cano. (2013). Predicting student failure at school
using genetic programming. ApplIntell, vol. 38, pp.315–330.

[4] Shaobo Huang and Ning Fang. (2013). predicting student academic performance in an
engi-neering dynamic course: A comparison of four types of predictive mathematical
models. Computers & Education, vol. 61, pp. 133–145.
https://doi.org/10.1016/j.compedu.2012.08. 015

[5] Ajay Kumar Pal and Saurabh Pal. (2013). Data Mining Techniques in EDM for
Predicting the Performance of Students.International Journal of Computer and
Information Technol-ogy, vol. 02, Issue 06.

[6] Ramanathan, Saksham Dhanda and Suresh Kumar D. (2013). Predicting Student
Perfor-mance using Modified ID3 Algorithm. International Journal of Engineering and
Technol-ogy, vol. 5 No 3.

[7] Alaa Khalaf Hamoud. (2016). Selection of Best Decision Tree algorithm for prediction
and classification of student Action. American International Journal of Research in
Science, Technology, Engineering & Mathematics, vol 1, pp. 26-32.

[8] Dech Thammasiri, DursunDelen, PhayungMeesad andNihatKasap. (2014). A critical


assess-ment of imbalanced class distribution problem: The case of predicting freshmen
student at-trition. Expert Systems with Applications, 41, pp.321–330.
https://doi.org/10.1016/ j.eswa.2013.07.046

[9] Ya-Han Hu, Chia-Ling L and Sheng-Pao Shih. (2014). Developing early warning
systems to predict students’ online learning performance. Computers in Human
Behavior, 36, pp. 469–478. https://doi.org/10.1016/j.chb.2014.04.002
[10] SreckoNatek and Moti Zwilling. (2014). Student data mining solution–knowledge
manage-ment system related to higher education institutions. Expert Systems with
Applications, 41, pp.6400–6407. https://doi.org/10.1016/j.eswa.2014.04.024

[11] Uskov, Vladimir,Bakken, Jeffrey,Byerly, Adam,Machine Learning-based Predictive


Analytics of Student Academic Performance in STEM Education
10.1109/EDUCON.2019.8725237
source: https://ieeexplore.ieee.org/document/1263284?arnumber=1263284

[12]Ansar Siddique , Asiya Jan , Fiaz Majeed , Predicting Academic Performance Using
an Efficient Model Based on Fusion of Classifiers.
Source:https://www.mdpi.com/2076-3417/11/24/11845

[13]Havan Agarwal , Harshil mavani , 2015 , Student Performance Prediction using


Machine Learning , International journal of engineering research and technology ,
http://dx.doi.org/10.17577/IJERTV4IS030127

[14] J. Dhilipan ,Prediction of Students Performance using Machine learning ,


Source:- https://iopscience.iop.org/article/10.1088/1757-899X/1055/1/012122

You might also like