0% found this document useful (0 votes)

45 views16 pages

Dissertation Part-I: Name:Kamalpreet Kaur Roll No.:2018CSB2015 Guide:Prof Kiranbir Kaur

This document provides an index and introduction for a dissertation on predicting student performance. The index lists sections on the base paper, literature survey, comparison table, research gap, problem definition, methodology/algorithm, conclusion, and references. The introduction discusses integrity checks, feature selection, and classification for student performance monitoring and data mining techniques.

Uploaded by

gursirat singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views16 pages

Dissertation Part-I: Name:Kamalpreet Kaur Roll No.:2018CSB2015 Guide:Prof Kiranbir Kaur

Uploaded by

gursirat singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Dissertation Part-I

Name:Kamalpreet Kaur
Roll No.:2018CSB2015
Guide:Prof Kiranbir Kaur
Index

 Introduction
 Base Paper
 Literature survey
 Comparison Table
 Research Gap
 Problem Definition
 Methodology and Algorithm
 Conclusion
 References
Introduction
The student performance monitoring is critical in indicating which class student must select
for better performance. Several techniques plays a part in the selection of features that must
be considered while selecting class of student. These techniques are part of data mining.
Data mining means , filtering mechanism that must be accommodated to select required data
from large dataset. There are different phases that must be accommodated within mining
approach for student class selection and prediction procedure.
 Integrity Check
 Feature selection
 Classification
1. Integrity Check

This indicates that validity of data is upto the mark or not. Integrity constraint is used to
determine weather data is valid or not. Integrity constraints are divided into following
categories

a. Entity integrity constraint

b. Referential integrity constraint
Entity integrity constraint indicates primary key must be unique and cannot contain null
value. E.g let we have student table containing following data
Referential integrity constraint indicates values of foreign key must either match with the
primary key of other table or it should contain null value. The table containing primary key
is known as child table and table containing foreign key is known as parent table. Consider a
student and fees table for demonstrating this integrity constraint.

Table 1: Student table

Name Rollno Marks Class
Sonu 100 100 BCA
Radhika 200 100 BCA
Munish 201 200 BSc
Kalian 300 200 BA
Rama 305 200 BSc
Sonia 400 200 MSc
Rama 403 100 MSc
Sonu 502 100 BCA

Table 2: Student table

Name Rollno Marks Class
Sonu 100 100 BCA
Radhika 200 100 BCA
Munish 201 200 BSc
Kalian 300 200 BA
Rama 305 200 BSc
Sonia 400 200 MSc
Rama 403 100 MSc
Sonu 502 100 BCA
Table3 : Fees table
Rollno Name Fees
100 Sonu 5000
200 Radhika 2000
201 Munish 3000
201 Kalian 4000

Table 2 contains child process and table 3 specifies parent process. Rollno in fees table must match with
student table rollno or it should contain null value. This is known as referential integrity constraint.
2. Feature Selection
Feature selection is the mechanism of determining features from dataset. Individual fields from the
dataset are extracted and for doing so feature extraction can be used using optimization procedure. For
optimization, genetic based approach, ant colony optimization , or particle swarm optimization can be
used for feature extraction.
To select feature that are required for student performance prediction statistical based approach can be
used. The features that can be used includes mean, median, mode, kurtosis, regression, correlation etc.
all these feature extracted and selected for student performance prediction.
3.Classification
This is the last phase in student performance prediction process. Collaborative sum of different features
gives class in which result must lie. Class prediction often suffer from deviation factor that could be
more or less depending upon received value. Classification process is expressed in terms of confusion
matrix. The critical parameter for the observation is classification accuracy.
This work is based on associative learning based approach for student performance prediction. The
overall goal is to increase classification accuracy. Next section discussed existing literature that has
done work towards this section.
Base Paper

Title: Predicting students’ final degree classification using an extended

profile
Author: Sahar Al-Sudani1 & Ramaswamy Palaniappan1
Received: 23 October 2018/Accepted: 18 January 2019/Published
online: 2 February 2019
Literature Survey
The system of study includes around performance evaluation of students. Collection of data in such a way forms Big Data. This
section presents tools and techniques that are used for evaluation and filtering of Big data corresponding to student performance
evaluation.

(Al-Sudani 2019) proposed a neural network based approach for predicting student performance. Large dataset comprising of 491
students is used in this case. Feed forward network is used to tackle uncertainties present within dataset. The classification
accuracy is achieved to be 83-85% depending upon uncertainties presents within dataset. Result is compared against k nearest
neighbour and Naïve Bayes approach for validation. The prediction model used in this approach although effective enough but
recommendations for the weak students is not generated that could resolve the retention problem.

(Khder 2018) proposed a classification model for university students. Dataset worked upon by this literature is real time. Dataset
size is considerably large. Tool used for prediction includes mining based random forest approach. The result predictor approach
used in this literature present classification accuracy up to 89%. The problem of high execution time can be rectified in future
work.

(Bekele and Menzel 2017) proposed Bayesian network based student performance system. The case study of ethiopian student is
presented using this approach. Bayesian network based approach is used as a tool for predicting student performance but dataset
used in small in scale and no confusion matrix is formed in this case. In future work confusion matrix could be accommodated
within Bayesian based network for student performance prediction.
(Eashwar and Venkatesan 2017) proposed a student performance evaluation using SVM. This SVM based approach is based on
forming hyper-planes. The segmentation is used in order to divide the data into critical and non- critical segments and k-means
segmentation is used for classification. The classification accuracy is of 85% that can be further improved by accommodating
missing data handling mechanisms within this approach.

(Zaffar and Hashmani 2017)Discussed big data approach for student performance monitoring through feature selection. Data about
student is gathered both real time and through offline dataset formation mechanism to form synthetic dataset. Categorization is
applied for fast analysis of data. The parallel data analysis approach is used for analyzing distinct categories also known as clusters.
Execution time is reduced but reliability is at stakes since accuracy optimization mechanism is not applied.

(Singh et al. 2016)Proposed data mining approach for predicting student performance. The mechanism employed technique in order
to judge the ranking of university so that student can take admission in best possible institution. Analysis is made using supervised
learning and rapid mining tool. Noise handling mechanism is employed to handle noisy data.

(Muthukrishnan 2018)presented a survey of techniques corresponding to student performance prediction. These techniques include
mechanism to handle the issues corresponding to problems in prediction due to missing data. Big data formation in education is yet
to be explored but this paper studied distinct papers to explore tools and techniques desirable for prediction. This literature however
does not present mechanisms for enhancing accuracy in prediction.

(Dogan and Diri 2016)discussed mechanisms to handle predictions corresponding to MOOC analysis. Importance of learner,
teacher manager and policy makers in the MOOC analysis is judged and result is presented in the form of classification accuracy. It
is a review paper and no new mechanism is suggested to improve classification accuracy.
Comparison Table
Author and Reference Technique Parameters Results Shortcoming

Al-sudani 2019 Neural Network Classification accuracy, 83% to 85% accuracy No recommendation
Specificity , sensitivity mechanism suggested to
tackle retention problem

Khder 2018 Classification model for Accuracy 89% Missing filtering

university students mechanism that could be
used to reduce execution
time
Bekele and Menzel 2017 Bayessian network based Parameters are not Reliability specified Confusion matrix is not
approach discussed without percentage integrated
specification
Eashwar and Venkatesan Support Vector machine Classification Accuracy 85% Missing data handling
2017 mechanism is not
employed
Zaffar and Hashmani Feature selection Feature classification 88% Feature selection
2017 approach approach does not
consider the reliability in
prediction
Singh et al. 2016 Rapid mining approach Accuracy Not specified Rapid mining approach
cannot handle distinct
category of data
Dogan and Diri 2016 MOOC analysis Accuracy Not specified This mechanism can only
be employed for distance
based courses
Research Gap

Predicting performance of student using learning analytics is need of the hour. This is required due
to uncertainty in cause selection from student side. This uncertainty in course prediction can be
overcome by the help of technology. The research gap is listed as under

Shortcoming of existing approach is listed as under

 The paper uses dataset that is real time in nature. This real time data is maintained using
specified format. Any kind of noise present within dataset such as missing data or wrong data
input could hamper performance of the system.
 The base paper uses neural network based approach with 100 nodes in processing layer. Weight
adjustment factor with 100 nodes using supervised learning gives classification accuracy below
90%.
 No recommendations are specified for weak student to boost their performance.
Problem Definition

There are three phases associated with the prediction of student performance. First phase
includes gathering of data and handling noise. In exiting system format handling
mechanism is not employed that means format in case is not correct then prediction
accuracy will be low. In the second phase neural network based approach is fed with the
data. Neural network based approach uses supervised learning but in case training and
testing data does not match than all instances of data cannot be classified. In the third phase
classification is performed, to perform classification hyper-plane strategy is employed,
weight adjustment is performed in case feature vector from test data does not match with
the training data. Feature vector formation for training can be slow in case size of dataset is
increased.

Methodology & Algorithm
This work proposes three phase approach to predict student performance along with
recommendation. In the first phase, format handling mechanism through normalization can be
employed. This phase identify initial percentage of student and eliminate possibilities of noisy
data. This mechanism uses concept of domain and range for specified field within the dataset.
Data not falling within domain and range is identified as noisy data. By using the ID of student,
missing value is identified and replace with averaged value from corresponding field.

In the second phase data fetched from pre-processing phase is fed into neural network layers.
Associative learning based approach is used. In this approach previous neuron learning is fed
into next neuron and hence learning is less time consuming as compared to supervised learning
mechanism. Once feature vectors are formed, test data is compared against the feature vector
formed corresponds to train data.

In the third phase, result is checked against the formed classes. In case no class satisfy the
result than weight is adjusted. This process continues until some classification result is
obtained. The methodology is listed as under
Algorithm for Student performance prediction
 Real time dataset formation
 Applying pre-processing mechanism using integrity check approach
 Feeding data into neural network
Input layer accept input from pre-processing phase
Training phase extract feature vectors from the variables by applying associative learning procedure
Applying weight adjustment to make result fall within particular class
 Perform classification using weight adjustment mechanism
Dataset formation using real time approach

Pre-processing using format handling

Fed data into neural network

Weight adjustment based classification

Conclusion

The propose system modify the existing system at three different parts. At first phase, pre-
processing mechanism is employed using format setting strategy. Once missing data is
handled classification accuracy at final stage is increased. At the second phase, neural
network based approach using associative learning is applied. At classification weight
adjustment approach is used.

This entire procedure can be implemented in MATLAB 2018b . The result includes
confusion matrix with accuracy, sensitivity, specificity and F-Score.
References
Al-sudani S (2019) Predicting students ’ final degree classification using an extended profile. Springer 2357–2369

Bekele R, Menzel W (2017) A BAYESIAN APPROCH TO PREDICT PERFORMANCE OF STUDENT(BAPPS): A

CASE WITH ETHIOPIAN STUDENTS. IJAER

Dogan G, Diri B (2016) An Overview Of Studies About Students ‘ Performance Analysis and Learning Analytics in
MOOCs. IEEE Access

Eashwar KB, Venkatesan R (2017) STUDENT PERFORMANCE PREDICTION USING SVM. IJMET 8:649–662

Khder M (2018) A Classification and Prediction Model for Student ’ s Performance in University. Res Gate. doi:
10.3844/jcssp.2017.228.233

Muthukrishnan SM (2018) Big Data Framework for Students ’ Academic Performance Prediction : A Systematic
Literature Review. 2018 IEEE Symp Comput Appl Ind Electron 376–382

Singh I, Sabitha ASAI, Bansal A, Cse A (2016) STUDENT PERFORMANCE ANALYSIS USING. IEEE Access 294–
299

Zaffar M, Hashmani MA (2017) Performance Analysis of Feature Selection Algorithm for Educational Data Mining.
IEEE Access 7–12