0% found this document useful (0 votes)
28 views

Student Performance Analysis Using Educa

This document discusses using educational data mining techniques like Naive Bayes classification and Weighted Naive Bayesian algorithms to analyze student performance data and predict student results. The goals are to help students improve their skills and reduce stress and suicide rates by providing predictions and recommendations before exams.

Uploaded by

NicholasRahe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Student Performance Analysis Using Educa

This document discusses using educational data mining techniques like Naive Bayes classification and Weighted Naive Bayesian algorithms to analyze student performance data and predict student results. The goals are to help students improve their skills and reduce stress and suicide rates by providing predictions and recommendations before exams.

Uploaded by

NicholasRahe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Vol.

14 ICETCSE 2016 Special Issue International Journal of Computer Science and Information Security (IJCSIS)
ISSN 1947-5500 [https://sites.google.com/site/ijcsis/]
69

Student Performance Analysis Using Educational


Data Mining

P Ramya M Mahesh Kumar


M.Tech Student, Asst Professor, Dept of IT
Gudlavalleru Engineering LakiReddy Balireddy
College, College of Engineering,
Gudlavalleru, Krishna(Dt) Mylavaram, Krishna(Dt)
Vijayawada Vijayawada

Abstract— Software industry is hiring the students from Bayes, K- Nearest neighbor, and many others. Using these
the engineering colleges who are good in communication, techniques many kinds of knowledge can be discovered such as
programming, and also academically performing well. Most of the association rules, classifications and clustering. The discovered
engineering institutions focused on the students performance on knowledge can be used for prediction regarding enrolment of
the above stated factors. The engineering students have to students in a particular course, alienation of traditional
improve their academic performance, programming skills and classroom teaching model, detection of unfair means used in
also communication skills. To help such kind of students, we online examination, detection of abnormal values in the result
designed a project which can predict the students performance sheets of the students, prediction about students‟ performance
before the announcement of their results and before they attend and so on.
their semester exams. By this the students can know their The main aim of this project is to improvise the student performance
performance and can improve their skills by proper planning or in studies based on some important factors. Education is an essential
by making changes in their plans. This can help the students element for the betterment and progress of a country. It enables the
improve in their academics, which eventually leads to a good people of a country civilized and well mannered. Now-a-days
performance in their end examinations. By this the suicide rates developing new methods to discover knowledge from educational
of students will also get reduced since the stress is reduced. This database in order to analyse student's trends and behaviours towards
could help in our country development by providing good and education. To analyse the data from different dimensions categorize
efficient engineers to the country. it and to summarize the relationships. It motivated us to work on
We applying Naive Bayes classification algorithm and student dataset analysation.The data collection, categorization and
Weighted Naïve Bayesian algorithm on the student data set which classification is being performed manually. The main disadvantage
is collected from LBRCE IT department, Mylavaram for building of this process is delay in results, remedial measures are not taken
this model. Based on these results we can classify the weak students properly due to late analysis of student performance. There will be
and take the remedial measures to improve their performance. delay in the results announcements which leads to the poor performance
Keywords: Educational Data Mining, Classification, Prediction. of the students in the next examination due to lack of planning in their
I. INTRODUCTION preparation. When count of students increases, the analysis of
performance of a student becomes difficult. To overcome this difficulty
The advent of information technology in various fields has we now introduce you to educational data mining. When institutes store
lead the large volumes of data storage in various formats like their students details in cloud, it will be difficult to analyse large data
records, files, documents, images, sound, videos, scientific data often called as big data. By applying data mining on the data stored,
and many new data formats. The data collected from different we can easily categories and analyse the results of a student in short
applications require proper method of extracting knowledge time without any difficulties. Here, mainly concentrated on the students
from large repositories for better decision making. Knowledge internal marks, ability to concentrate, attendance, awareness on course
outcomes, tutorials, semester marks, content perception, assignments
discovery in databases (KDD), often called data mining, aims
at the discovery of useful information from large collections of II. DATA MINING DEFINITION AND TECHNIQUES
data [1]. The main functions of data mining are applying Data mining, also popularly known as Knowledge
various methods and algorithms in order to discover and extract Discovery in Database refers to extracting or “mining"
patterns of stored data [2]. Data mining and knowledge knowledge from large amounts of data. Data mining techniques
discovery applications have got a rich focus due to its are used to operate on large volumes of data to discover hidden
significance in decision making and it has become an essential patterns and relationships helpful in decision making. While
component in various organizations. Data mining techniques data mining and knowledge discovery in database are
have been introduced into new fields of Statistics, Databases, frequently treated as synonyms, data mining is actually part of
Machine Learning, Pattern Reorganization, Artificial the knowledge discovery process. The sequences of steps
Intelligence and Computation capabilities etc. identified in extracting knowledge from data are shown in
There are increasing research interests in using data mining Figure 1.
in education. This new emerging field, called Educational Data
Mining, concerns with developing methods that discover
knowledge from data originating from educational
environments [3]. Educational Data Mining uses many
techniques such as Decision Trees, Neural Networks, Naïve
Proceedings of 3rd International Conference on Emerging Technologies in Computer Science & Engineering (ICETCSE 2016)
V. R. Siddhartha Engineering College, Vijayawada, India, October 17-18, 2016
Vol. 14 ICETCSE 2016 Special Issue International Journal of Computer Science and Information Security (IJCSIS)
ISSN 1947-5500 [https://sites.google.com/site/ijcsis/]
70

C. Predication
Regression technique can be adapted for predication.
Regression analysis can be used to model the relationship
between one or more independent variables and dependent
variables. In data mining independent variables are attributes
already known and response variables are what we want to
predict. Unfortunately, many real-world problems are not
simply prediction. Therefore, more complex techniques (e.g.,
logistic regression, decision trees, or neural nets) may be
necessary to forecast future values. The same model types can
often be used for both regression and classification. For
example, the CART (Classification and Regression Trees)
decision tree algorithm can be used to build both classification
trees (to classify categorical response variables) and regression
trees (to forecast continuous response variables). Neural
networks too can create both classification and regression
models.
FIG 1:KDD PROCESS
D. Association rule
Various algorithms and techniques like Classification , Association and correlation is usually to find frequent item findings
Clustering , Regression , Artificial Intelligence , Neural among large data sets. This type of finding helps businesses to
Networks , Association rules , Decision trees , Genetic make certain decisions, such as catalogue design, marketing and
Algorithm, Nearest Neighbor method etc., are used for customer shopping behavior analysis. Association Rule
knowledge discovery from databases. These techniques and algorithms need to be able to generate rules confidence values less
methods in data mining need brief mention to have better than one. However the number of Association Rules for a given
understanding. dataset is generally very large and a high proportion of the rules are
A. Classification usually of little (if
any) value.
The Classification is the one of the most important
technique used in data mining. It is a 2 step process 1.first build E. Neural networks
classification model. 2. Predict the class label, which employs
Neural network is a set of connected input/output units and
a set of pre-classified examples to develop a model that can
each connection has a weight present with it. During the
classify the population of records at large. This approach
learning phase, network learns by adjusting weights so as to be
regularly employs decision tree or neural network-based able to predict the correct class labels of the input tuples. Neural
classification algorithms. The data classification process
networks have the remarkable ability to derive meaning from
involves learning and classification. In Learning the training complicated or imprecise data and can be used to extract
data are analyzed by classification algorithm. In classification patterns and detect trends that are too complex to be noticed by
test data are used to estimate the accuracy of the classification either humans or other computer techniques. These are well
rules. If the accuracy is acceptable the rules can be applied to suited for continuous valued inputs and outputs. Neural
the new data tuples. The classifier-training algorithm uses these networks are best at identifying patterns or trends in data and
pre-classified examples to determine the set of parameters well suited for prediction or forecasting needs.
required for proper discrimination. The algorithm then encodes
these parameters into a model called a classifier. F. Decision Trees
B. Clustering Decision tree is tree-shaped structures that represent sets of
decisions. These decisions generate rules for the classification
Clustering can be defined as discovery of similar classes of of a dataset. Specific decision tree methods include
objects.. By using clustering techniques we can further identify Classification and Regression Trees (CART) and Chi Square
dense and sparse regions in object space and can discover Automatic Interaction Detection (CHAID).
overall distribution pattern and correlations among data
attributes. Classification approach can also be used for effective G. Nearest Neighbor Method
means of distinguishing groups or classes of object A technique that classifies each record in a dataset based on
but it becomes costly so clustering can be used as a combination of the classes of the k record(s) most similar to
preprocessing approach for attribute subset selection and it in a historical dataset (where k is greater than or equal to 1).
classification. Sometimes called the k-nearest neighbor technique.

Proceedings of 3rd International Conference on Emerging Technologies in Computer Science & Engineering (ICETCSE 2016)
V. R. Siddhartha Engineering College, Vijayawada, India, October 17-18, 2016
Vol. 14 ICETCSE 2016 Special Issue International Journal of Computer Science and Information Security (IJCSIS)
ISSN 1947-5500 [https://sites.google.com/site/ijcsis/]
71

III. RELATED WORK


Data mining in higher education is a recent research field Pandey and Pal [10] conducted study on the student
and this area of research is gaining popularity because of its performance based by selecting 60 students from a degree
potentials to educational institutes. college of Dr. R. M. L. Awadh University, Faizabad, India. By
means of association rule they find the interestingness of
Data Mining can be used in educational field to enhance our student in opting class teaching language.
understanding of learning process to focus on identifying,
extracting and evaluating variables related to the learning Ayesha, Mustafa, Sattar and Khan [11] describes the use of
process of students as described by Alaa el-Halees [4]. Mining k-means clustering algorithm to predict student’ s learning
in educational environment is called Educational Data Mining. activities. The information generated after the implementation
of data mining technique may be helpful for instructor as well
Han and Kamber [3] describes data mining software that as for students.
allow the users to analyze data from different dimensions,
categorize it and summarize the relationships which are Bray [12], in his study on private tutoring and its
identified during the mining process. implications, observed that the percentage of students receiving
private tutoring in India was relatively higher than in Malaysia,
Pandey and Pal [5] conducted study on the student Singapore, Japan, China and Sri Lanka. It was also observed
performance based by selecting 600 students from different that there was an enhancement of academic performance with
colleges of Dr. R. M. L. Awadh University, Faizabad, India. By the intensity of private tutoring and this variation of intensity of
means of Bayes Classification on category, language and private tutoring depends on the collective factor namely socio-
background qualification, it was found that whether new comer economic conditions.
students will performer or not.
Bhardwaj and Pal [13] conducted study on the student
Hijazi and Naqvi [6] conducted as study on the student performance based by selecting 300 students from 5 different
performance by selecting a sample of 300 students (225 males, degree college conducting BCA (Bachelor of Computer
75 females) from a group of colleges affiliated to Punjab Application) course of Dr. R. M. L. Awadh University,
university of Pakistan. The hypothesis that was stated as Faizabad, India. By means of Bayesian classification method on
"Student's attitude towards attendance in class, hours spent in 17 attribute, it was found that the factors like students‟ grade in
study on daily basis after college, students' family income, senior secondary exam, living location, medium of teaching,
students' mother's age and mother's education are significantly mother ’ s qualification, students other habit, family annual
related with student performance" was framed. By means of
income and student’s family status were highly correlated
simple linear regression analysis, it was found that the factors
with the student academic performance.
like mother’s education and student’s family income were
highly correlated with the student academic performance. IV. DATA MINING PROCESS
Khan [7] conducted a performance study on 400 students In present day’s educational system, a students‟ performance
comprising 200 boys and 200 girls selected from the senior is determined by the internal assessment and end semester
secondary school of Aligarh Muslim University, Aligarh, India examination. The internal assessment is carried out by the teacher
with a main objective to establish the prognostic value of based upon students‟ performance in educational activities such as
different measures of cognition, personality and demographic class test, seminar, assignments, general proficiency, attendance
variables for success at higher secondary level in science and lab work. The end semester examination is one that is scored
stream. The selection was based on cluster sampling technique by the student in semester examination. Each student has to get
in which the entire population of interest was divided into minimum marks to pass a semester in internal as well as end
groups, or clusters, and a random sample of these clusters was semester examination.
selected for further analyses. It was found that girls with high
socio-economic status had relatively higher academic A. Data Preparations
achievement in science stream and boys with low socio- The data set used in this study was obtained from LakiReddy
economic status had relatively higher academic achievement in Bali reddy College of Engineering ,Information Technology
general. department, Mylavaram from session 2012 to 2016. Initially
size of the data is 50. In this step data stored in different tables
Galit [8] gave a case study that use students data to analyze
their learning behavior to predict the results and to warn was joined in a single table after joining process errors were
students at risk before their final exams. removed.
B. Data selection and transformation
Al-Radaideh, et al [9] applied a decision tree model to
predict the final grade of students who studied the C++ course In this step only those fields were selected which were
in Yarmouk University, Jordan in the year 2005. Three different required for data mining. A few derived variables were selected.
classification methods namely ID3, C4.5, and the NaïveBayes While some of the information for the variables was extracted
were used. The outcome of their results indicated that Decision from the database. All the predictor and response variables
Tree model had better prediction than other models. which were derived from the database are given in Table I for
reference.

Proceedings of 3rd International Conference on Emerging Technologies in Computer Science & Engineering (ICETCSE 2016)
V. R. Siddhartha Engineering College, Vijayawada, India, October 17-18, 2016
Vol. 14 ICETCSE 2016 Special Issue International Journal of Computer Science and Information Security (IJCSIS)
ISSN 1947-5500 [https://sites.google.com/site/ijcsis/]
72

TABLE I. STUDENT RELATED VARIABLES


Awareness on co’s:
Variable Description Possible Values Before learning a subject one should have a clarity
{A>60% about what they are going to learn and why they are going to
B>45 & <60%
Internal C>36 & <45% learn. When a student knows the course outcomes before
IM Marks Fail<36%} starting the course, it will be easy for him/her to concentrate
{A > 60% more on the subject. By having knowledge about course
B >45 & <60% outcomes, the student gains interest to start that course and
PSM Previous Semester Marks
C >36 & <45% improve his knowledge.
Fail < 36%}
Assignments:
Basics Basics in the subject {Poor , Average, Good} By writing assignments, the students read the
textbook, understand it and need to prepare notes for it. When
Ability to Concentrate in the a student frequently submits assignments, then the teacher can
ACIC Class {Poor , Average, Good}
say that the student is regular and interested in learning by
ASS Assignment {Yes, No} his/her own. By assignments, the students can learn subject by
CP Content Perception {Poor , Average, Good} their own. Moreover, instead of reading subject, writing the
subject improves the concentration of the student.
ATT Attendance {Poor , Average, Good}
Course Internal marks:
Awareness Outcomes The marks allotting to the students are divided as
on CO’s Awareness {Yes, No} internal and external marks. The external marks are nothing but
{First > 60% end exams (or) sem exams. By dividing the marks, makes it
Second >45 & <60% easier to assess the student performance more accurately. To
ESM End Semester Marks assess our capability before end exams.
Third >36 & <45%
Fail < 36%} Semester Marks:
The semester marks of a student are helpful in analyzing
performance of particular student. The semester marks are the
The domain values for some of the variables were
defined for the present investigation as follows: marks that are obtained by a student in his/her end exam. The
semester marks are converted in percentages and these
Basics: percentages are considered during the campus placements as
cut-off. The previous semester marks are considered to
Helping students to study effectively. Easy to analyze the improve the students performance in their next semester. So
subject by knowing the basics and can easily remember the that he can maintain percentage to get a good job. By
concept for longer time. Can generate new ideas. Allowing considering the semester wise marks of a student, we can
students to more clearly communicate ideas, thoughts and observe the change in the performance of that student.
information. Helping students integrate new concepts with older Tutorials:
concepts. By conducting tutorials the staff (or) the teacher can
maintain the record of a students performance. Observing the
Ability to concentrate on the class: tutorials, the student can know where he should concentrate to
Pay attention in the class is more important to gain more score more marks.
knowledge. Concentration in the class leads the students to
understand the subject more easily. By paying attention in the C. Proposed System
class, students can do assignments & homework easily Can
easily remember the topics being concentration in the classes.
By taking notes in the class is helps to study easily. By
concentration in the class students can take notes very
effectively, which will help his/her further reference.
Attendance
The presence of student in a class can also improve
his/her concentration in studies. Due to attendance marks ,the
students attends the classes regularly .So, that they concentrate
more in studies. Students can share knowledge with others. Can
easily communicate with others.

Content perception:
By knowing about the content perception of a student, 
the teacher can help the student in understanding the subject FIG2:BLOCK DIAGRAM
further. We can assess whether the student listens or not by
content perception.

Proceedings of 3rd International Conference on Emerging Technologies in Computer Science & Engineering (ICETCSE 2016)
V. R. Siddhartha Engineering College, Vijayawada, India, October 17-18, 2016
Vol. 14 ICETCSE 2016 Special Issue International Journal of Computer Science and Information Security (IJCSIS)
ISSN 1947-5500 [https://sites.google.com/site/ijcsis/]
73

SNo ACO Basics ACOC CP IM SM ASS TUT ATT PSM


To justify the capabilities of data mining techniques in 1 No Avg Avg Avg A Avg No Yes A Avg
context of higher education by offering a data mining model for
2 Yes Avg Avg Strong C Fail No No C Fail
higher education system in the university we designed a model
called "STUDENT PERFORMANCE ANALYSIS USING 3 No Avg Avg Weak C Fail No No B Fail
EDUCATIONAL DATA MINING”. Using these techniques 4 No Avg Strong Strong B Avg No No B Fail
many kinds of knowledge can be discovered such as 5 No Avg Avg Avg B Avg Yes No C Fail
association rules, classifications and clustering. The main 6 No Avg Weak Avg C Fail No No B Fail
objective of this project is to use data mining methodologies to 7 No Avg Avg Avg A Avg Yes No A Avg
study student's performance in the courses. Data mining provides
many tasks that could be used to study the student performance. 8 No Avg Avg Strong A Avg Yes Yes A Avg
In this project, the classification task is used to evaluate student's 9 No Avg Strong Strong C Fail No No C Fail
performance and as there are many approach that is used for 10 No Weak Weak Weak C Fail No No C Fail
data classification. Information's like Attendance, Class test, 11 No Weak Weak Avg B Avg No Yes B Fail
Seminar and Assignment marks were collected from the 12 No Weak Weak Weak D Fail No No C Fail
student's management system, to predict the performance at the
end of the semester. This project reduces the time taken by the 13 No Avg Avg Avg B Avg Yes Yes B Avg
survey to collect the data, analyze the data and also reduces the 14 Yes Avg Avg Strong C Fail No No C Fail
errors in entering the data than that of the survey method. 15 No Avg Avg Avg A Avg No Yes A Avg
Software industry is hiring the students from the engineering 16 Yes Avg Avg Strong C Fail No No C Fail
colleges who are good in communication, programming, and 17 No Avg Avg Weak C Fail No No B Fail
also academically performing well. Most of the engineering
18 No Avg Strong Strong B Avg No No B Fail
institutions focused on the students' performance on the above
stated factors. We are applying naive Bayes classification 19 No Avg Avg Avg B Avg Yes No C Fail
algorithm and weighted Naive Bayes algorithm on the student 20 No Avg Weak Avg C Fail No No B Fail
data set which is collected from LBRCE IT department, 21 No Avg Avg Avg A Avg Yes No A Avg
Mylavaram for building this model. 22 No Avg Avg Strong A Avg Yes Yes A Avg
Modules include
23 No Avg Strong Strong C Fail No No C Fail
1.Form Creation
2.Collection Of Trained Datasets 24 No Weak Weak Weak C Fail No No C Fail
3.PreProcessing Datasets Collected 25 No Weak Weak Avg B Avg No Yes B Fail
4.Applying Naive Bayesian Classifier 26 No Weak Weak Weak D Fail No No C Fail
5.Applying Weighted Naive Bayesian Classifier 27 No Avg Avg Avg B Avg Yes Yes B Avg
6.Calculating Confusion Matrix
28 Yes Avg Avg Strong C Fail No No C Fail
7.Calculating Precision, Recall & Specificity
8.Comparision By Graphical Representation
TABLE 2: DATA SET OF STUDENTS
2. PRE-PROCESSING DATASETS COLLECTED:
V. RESULTS AND DISCUSSION
The data set of 28 students used in this study was obtained Preprocessing is done in this module. Preprocessing
from LakiReddy Bali eddy College of Engineering ,Dept of IT, techniques are data cleaning, data integration, data
Mylavaram from 2012 to 2016. transformation, data reduction. In our project we are doing
cleaning, transformation and reduction. In cleaning we are
1.COLLECTION OF TRAINED DATASETS: pruning incomplete values, inconsistent values and Null
values. All these errors are pruned by using the java script. In
We created JSP files to store the trained dataset and transformation we are converting marks into grades to classify
test dataset into the database. Java Server Pages (JSP) is a the end results.
technology for developing web pages that support dynamic
content which helps developers insert java code in HTML
3 APPLYING NAÏVE BAYESIAN CLASSIFIER:
pages by making use of special JSP tags. JSP is more powerful
and easier to use. In the early days of the Web, the Common
In our project we are applying two algorithms on the
Gateway Interface (CGI) was the only tool for developing
student dataset to predict the student performance analysis.
dynamic web content. However, CGI is not an efficient
One is Naive Bayesian Algorithm and other is Weighted Naïve
solution .JSP is the better solution for dynamic web content.
Bayesian Algorithm. Coming to Naïve Bayesian Algorithm, it
Released in 1999 by Sun Microsystems, JSP is similar to PHP
is based on the Bayesian theorem. It is particularly suited when
and ASP, but it uses java programming language.
the dimensionality of the inputs is high. Parameter estimation
for naive Bayes models uses the method of maximum
likelihood. In spite over-simplified assumptions , it often
performs better in many complex real-world
situations.Advantage: Requires a small amount of training data
to estimate the parameters. The Weighted Naïve Bayesian
algorithm
Proceedings of 3rd International Conference on Emerging Technologies is also
in Computer based&on
Science Bayesian(ICETCSE
Engineering theorem but some weights
2016)
V. R. Siddhartha Engineering College, Vijayawada, India, October 17-18, 2016
Vol. 14 ICETCSE 2016 Special Issue International Journal of Computer Science and Information Security (IJCSIS)
ISSN 1947-5500 [https://sites.google.com/site/ijcsis/]
are assigned to the attributes that plays a major role in the 3.2 Weighted Naïve Bayesian Classification Algorithm : 74
student end result prediction. When compared to Naïve
Bayesian Algorithm , the weighted naive Bayesian algorithm The weighted Naive Bayesian Classification
gives more accurate results. represents a supervised learning method as well as a statistical
method for classification. Assumes an underlying probabilistic
3.1 Naive Bayesian Classification Algorithm: model and it allows us to capture uncertainty about the model in
a principled way by determining probabilities of the outcomes.
Bayesian classifiers are statistical classifiers. They can It can solve diagnostic and predictive problems. Bayesian
predict class membership probabilities, such as the probability classification provides practical learning algorithms and prior
that a given tuple belongs to a particular class. Bayesian knowledge and observed data can be combined. Bayesian
classification is based on Bayes theorem. Studies comparing Classification provides a useful perspective for understanding
classification algorithms have found a simple Bayesian and evaluating many learning algorithms. It calculates explicit
classifier known as the naïve Bayesian classifier to be probabilities for hypothesis and it is robust to noise in input
comparable in performance with decision tree and selected data. In this we are applying weights for each and every attribute.
neural network classifiers. Bayesian classifiers have also
exhibited high accuracy and speed when applied to large Algorithm
database. The Bayesian Classification represents a supervised
learning method as well as a statistical method for Step-1: Let T be a training set of samples, each with
classification. Assumes an underlying probabilistic model and their class labels. There are k classes,C1,C2Ck. Each
it allows us to capture uncertainty about the model in a sample is represented by an n-dimensional
principled way by determining probabilities of the outcomes. vector,X={x1,x2,.xn}, measured values of n attributes,
It can solve diagnostic and predictive problems. Bayesian A1,A2, An, respectively.
classification provides practical learning algorithms and prior Step-2: Calculating prior probabilities
knowledge and observed data can be combined. Bayesian
Classification provides a useful perspective for understanding p(Ci)=n(Ci)/m where i=1,2,m;
and evaluating many learning algorithms. It calculates explicit Step-3: We add weights to Xn if their value is
probabilities for hypothesis and it is robust to noise in input data. highest among the values.
[P(X/Ci).P(Ci)] + Value.
Algorithm Step-4: Else we do calculation
Step-1: Let T be a training set of samples, each with p(X/Ci)=nk=1p(Xk/Ci).
their class labels. There are k classes,C1,C2Ck. Each Step-5: In order to predict the class label of
sample is represented by an n-dimensional X,p(X/Ci)p(Ci) is evaluated for each class Ci.
vector,X={x1,x2,.xn}, measured values of n attributes, p(X/Ci).p(Ci)>p(X/Cj).p(Cj) for 1jm, ji.
A1,A2, An, respectively.
Step-2: Calculating prior probabilities Weights Included

p(Ci)=n(Ci)/m where i=1,2,m; The below table contains Boolean value attribute weights
from scale 0-1. These weights are added in weighted naïve
Step-3: Posterior probabilities Bayesian algorithm, so that to get more accurate results
p(Ci/X)=[p(X/Ci).p(Ci)]/p(X). than that of naïve Bayesian classifier. The Boolean
valued attributes are nothing but having binary values
Step-4: Calculating like yes or no, true or false.
p(X/Ci)=nk=1p(Xk/Ci).
Boolean value Attribute weights from 0-1 scale
Step-5: In order to predict the class label of S.No Awareness Assignments Tutorials
X,p(X/Ci)p(Ci) is evaluated for each class Ci. of CO’s
Yes No Yes No Yes No
p(X/Ci).p(Ci)>p(X/Cj).p(Cj) for 1jm, ji. Professor 0.18 0.0 0.22 0.0 0.22 0.0
1
Professor 0.20 0.0 0.18 0.0 0.18 0.0
2
Professor 0.22 0.0 0.20 0.0 0.20 0.0
3
Average 0.20 0.0 0.20 0.0 0.20 0.0

TABLE 3: BOOLEAN VALUE ATTRIBUTE WEIGHTS

Proceedings of 3rd International Conference on Emerging Technologies in Computer Science & Engineering (ICETCSE 2016)
V. R. Siddhartha Engineering College, Vijayawada, India, October 17-18, 2016
Vol. 14 ICETCSE 2016 Special Issue International Journal of Computer Science and Information Security (IJCSIS)
ISSN 1947-5500 [https://sites.google.com/site/ijcsis/]
75

Several standard terms have been defined for the 2 class matrix:
The accuracy (AC) is the proportion of the total number of
Multi value Attribute weights from 0-1 scale predictions that were correct.
S.N Basics Ability to Content It is determined using the equation
o Concentrate Perception AC= (a+d)/(a+b+c+d).
in the Class The recall or true positive rate (TP) is the proportion of positive
S A W S A W S A W cases that were correctly
vg vg vg It is determined using the equation
Prof 0. 0. 0. 0. 0. 0. 0. 0. 0. TP=d/(c+d),
esso 50 18 0 60 32 0 75 50 0 The false positive rate (FP) is the proportion of negatives cases
r1 that were incorrectly classified as positive as calculated using the
Prof 0. 0. 0. 0. 0. 0. 0. 0. 0. formula
esso 45 22 0 55 28 0 85 48 0 FP= b/(a+b).
r2 The true negative rate (TN) is defined as the proportion of
Prof 0. 0. 0. 0. 0. 0. 0. 0. 0. negatives cases that were classified correctly as calculated using
esso 65 20 0 65 30 0 80 52 0 the equation.
r3 TN= a/(a+b).
Ave 0. 0. 0. 0. 0. 0. 0. 0. 0. The false negative rate (FN) is the proportion of positives cases
rage 50 20 0 60 30 0 80 50 0 that were incorrectly classified as negative as calculated using the
S:Strong Avg:Average W:Weak equation.
TABLE 4: MULTI VALUE ATTRIBUTE WEIGHTS FN= c/(c+d).
Finally, precision (P) is the proportion of the predicted positive
Multi Value Attribute Weights cases that were correct, as calculated using the equation.
S.No A B C D P=d/ (b+d).
Professor 1 0.90 0.72 0.50 0.0
Professor 2 0.88 0.68 0.8 0.0
Professor 3 0.92 0.70 0.52 0.0
Average 0.90 0.70 0.50 0.0
TABLE 5: MULTI VALUE ATTRIBUTE WEIGHTS

NOTE:WE DON’T TAKE ANY WEIGHTS FOR


ATTENDANCE ATTRIBUTE

CALCULATING CONFUSION MATRIX

PREDICTED
Negative Positive
Actual NEGATIVE A B
POSITIVE C D

TABLE 6 CONFUSION MATRIX

The entries in the confusion matrix have the following Fig 3:Data set of 28 Students
meaning in the context of our study:
a. is the number of correct predictions that an instance is
negative,
b. is the number of incorrect predictions that an instance is
positive,
c. is the number of incorrect of predictions that an instance
negative, and
d. is the number of correct predictions that an instance is positive

CALCULATING RECALL,PRECISION & SPECIFICITY

In this module the accuracy, precision, recall and


specificity are calculated from the confusion matrix. By
considering the above table accuracy, precision, recall and
specificity are defined below.
Fig 4:Preprocessed data set of collected Students data set
Proceedings of 3rd International Conference on Emerging Technologies in Computer Science & Engineering (ICETCSE 2016)
V. R. Siddhartha Engineering College, Vijayawada, India, October 17-18, 2016
Vol. 14 ICETCSE 2016 Special Issue International Journal of Computer Science and Information Security (IJCSIS)
ISSN 1947-5500 [https://sites.google.com/site/ijcsis/]
[7] Case Study”, 2009..Morgan Kaufmann, 2000. 76
[8] Alaa el-Halees, “Mining students data to analyze e-Learning behavior: A
Case Study”, 2009..
[9] U . K. Pandey, and S. Pal, “Data Mining: A prediction of performer or
underperformer using classification”, (IJCSIT) International Journal of
Computer Science and Information Technology, Vol. 2(2), pp.686-690,
ISSN:0975-9646, 2011.
[10]
Case Study”, 2009..
[11] U . K. Pandey, and S. Pal, “Data Mining: A prediction of performer or
underperformer using classification”, (IJCSIT) International Journal of
Computer Science and Information Technology, Vol. 2(2), pp.686-690,
ISSN:0975-9646, 2011.
[12] Press, Massachusetts Institute Of Technology. ISBN 0–262 56097–
6,1996.
[13] J. Han and M. Kamber, “Data Mining: Concepts and Techniques,”
Morgan Kaufmann, 2000.
[14] Alaa el-Halees, “Mining students data to analyze e-Learning behavior: A
Case Study”, 2009..
[15] U . K. Pandey, and S. Pal, “Data Mining: A prediction of performer or
Fig 5:Comparison of Bayesian and Weighted Bayesian underperformer using classification”, (IJCSIT) International Journal of
Classifier Computer Science and Information Technology, Vol. 2(2), pp.686-690,
ISSN:0975-9646, 2011.
CONCLUSION
[16] S. T. Hijazi, and R. S. M. M. Naqvi, “Factors affecting student ’ s
In this paper, the classification task is used on student database performance: A Case of Private Colleges”, Bangladesh e-Journal of
Sociology, Vol. 3, No. 1, 2006.
to predict the students division on the basis of previous database.
As there are many approaches that are used for data classification,
[7] Z. N. Khan, “Scholastic achievement of higher secondary students in
the Naïve Bayesian Classifier and Weighted Naïve Bayesian science stream”, Journal of Social Sciences, Vol. 1, No. 2, pp. 84-87,
Classifier are used here. Information’s like Attendance, Class 2005..
test, Seminar and Assignment marks were collected from the [8] Galit.et.al, “Examining online learning processes based on log files
student’s previous database, to predict the performance at the end analysis: a case study”. Research, Reflection and Innovations in
Integrating ICT in Education 2007.
of the semester.
[9] Q. A. AI-Radaideh, E. W. AI-Shawakfa, and M. I. AI-Najjar, “Mining
This study will help to the students and the teachers to improve student data using decision trees”, International Arab Conference on
the division of the student. This study will also work to identify Information Technology(ACIT'2006), Yarmouk University, Jordan,
those students which needed special attention to reduce fail ration 2006.
and taking appropriate action for the next semester examination. [10] U. K. Pandey, and S. Pal, “A Data mining view on class room teaching
This can help the students improve in their academics, which language”, (IJCSI) International Journal of Computer Science Issue, Vol.
eventually leads to a good performance in their end examinations. 8, Issue 2, pp. 277-282, ISSN:1694-0814, 2011.
By this the suicide rates of students will also get reduced since the [11] Shaeela Ayesha, Tasleem Mustafa, Ahsan Raza Sattar, M. Inayat Khan,
“Data mining model for higher education system”, Europen Journal of
stress is reduced. This could help in our country development by Scientific Research, Vol.43, No.1, pp.24-29, 2010.
providing good and efficient engineers to the country.
[12] M. Bray, The shadow education system: private tutoring and its
implications for planners, (2nd ed.), UNESCO, PARIS, France, 2007.
[13] B.K. Bharadwaj and S. Pal. “Data Mining: A prediction for performance
REFERENCES improvement using classification”, International Journal of Computer
[1] Heikki, Mannila, Data mining: machine learning, statistics, and databases, Science and Information Security (IJCSIS), Vol. 9, No. 4, pp. 136-140,
IEEE, 1996. 2011.

[2] U. Fayadd, Piatesky, G. Shapiro, and P. Smyth, From data mining to [14] J. R. Quinlan, “Introduction of decision tree: Machine learn”, 1: pp. 86-
106, 1986.
knowledge discovery in databases, AAAI Press / The MIT Press,
Massachusetts Institute Of Technology. ISBN 0–262 56097–6,1996. [15] Vashishta, S. (2011). Efficient Retrieval of Text for Biomedical Domain
[3] J. Han and M. Kamber, “Data Mining: Concepts and Techniques,” using Data Mining Algorithm. IJACSA - International Journal of
Morgan Kaufmann, 2000. Advanced Computer Science and Applications, 2(4), 77-80.
Alaa el-Halees, “Mining students data to analyze e-Learning behavior: A [16] Kumar, V. (2011). An Empirical Study of the Applications of Data
Case Study”, 2009.. Mining Techniques in Higher Education. IJACSA - International Journal
of Advanced Computer Science and Applications, 2(3), 80-84. Retrieved
[4] U . K. Pandey, and S. Pal, “Data Mining: A prediction of performer or from http://ijacsa.thesai.org.
underperformer using classification”, (IJCSIT) International Journal of
Computer Science and Information Technology, Vol. 2(2), pp.686-690,
ISSN:0975-9646, 2011.
[5] Press, Massachusetts Institute Of Technology. ISBN 0–262 56097–
6,1996.
J. Han and M. Kamber, “Data Mining: Concepts and Techniques,”
Morgan Kaufmann, 2000.
[6] Alaa el-Halees, “Mining students data to analyze e-Learning behavior: A

Proceedings of 3rd International Conference on Emerging Technologies in Computer Science & Engineering (ICETCSE 2016)
V. R. Siddhartha Engineering College, Vijayawada, India, October 17-18, 2016

You might also like