ICSMB2016-C Anuradha
ICSMB2016-C Anuradha
net/publication/299993218
CITATIONS READS
26 1,697
2 authors, including:
Velmurugan Thambusamy
Dwaraka Doss Goverdhan Doss Vaishnav College
122 PUBLICATIONS 1,245 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
An Implementation of Substitution Techniques in Cloud Security using Playfair and Caesar algorithms View project
All content following this page was uploaded by Velmurugan Thambusamy on 08 April 2016.
Abstract: Data mining provides educational institutions that the capability to explore, visualize and analyze large
amounts of data in order to reveal valuable patterns in students’ learning behaviors. Turning raw data into useful
information and knowledge also enables educational institutions to improve teaching and learning practices, and to
facilitate the decision-making process in educational settings. Thus, educational data mining is becoming an
increasingly important with a specific focus to exploit the abundant data generated by various educational systems
for enhancing teaching, learning and decision making. In EDM, Feature Selection is to choose a subset of input
variables by eliminating irrelevant features. Feature Selection Algorithm has proven to be effective in enhancing
learning efficiency, increasing predictive accuracy and reducing complexity of learned results. The primary objective
of this research work is to investigate the most relevant subset features for achieving high performance accuracy by
adopting Correlation based feature Subset Attribute evaluation and Gain-Ratio Attribute evaluation feature
selection techniques. For classification, the Naïve Bayes classifier is implemented by using WEKA tool. The outcome
shows the effectiveness in the predictive accuracy with minimum number of attributes. Also the results reveals that
the selected data features have found to be influenced the classification process of the student performance model.
Keywords: Educational Data Mining (EDM), Classification algorithm, Naïve Bayes Algorithm, Feature Selection,
Prediction.
345
The 3rd International Conference on Small & Medium Business 2016
January 19 - 21, 2016, Nikko Saigon Hotel, Hochiminh, Vietnam
The effective feature selection techniques are for evaluation and naïve Bayes classifier for
required to analyze the efficient classification algorithms. classification purpose. Accuracy and time is the outcome
This research work attempts to foretell the students of the classification model and also various measures like
academic failure by reviewing the field of various feature sensitivity, specificity, precision and recall are also
selection algorithms based on the Naïve Bayes classifier. calculated [4]. A work carried out by Lumbini and Pravin
This research work is structured as follows. Section 2 [5] have proposed an experiment attempts the detection
illustrates the research work that has been conducted in of student’s failure to improve their academic
EDM. In section 3 consist of methods and materials of performance. They have applied different approaches to
the domain of study will be defined. The description of resolve the problem of high dimensionality and using
the process of building a model includes data collection classification algorithm on engineering students data set.
and used tools are given in Section 4. Then Section 5 Predictive Analytics Using Data Mining Technique
presents the experimentation and results obtained. Finally, [6] by Hina Gulati has presents the work of data mining
conclusion is given in Section 6. is predicting the dropout feature of students. Author also
applied some feature selection algorithms. Tool used for
II. RELATED WORKS feature selection and mining is weka. Another work by
Jai and David discussed about Analysis of Influencing
This section discusses about some of the research Factors in Predicting Students Performance Using MLP-
work carried out by various researchers in the same field. A Comparative Study [7]. This paper mainly focused on
A work done by Humera Shaziya et al. has presents an analyzing the prediction accuracy of the academic
approach to predict the performance of students in a performance using influencing factors by Multi Layer
semester exams. This approach is based on a Naive Perception algorithm and compares it with the prediction
Bayes classifier. The objective is to know what grades accuracy. Another research work carried out by Anal
students may obtain in their end semesters results. This and Devadatta have discussed about Application of
helps the educational institute, teachers and students i.e., Feature Selection Methods in Educational Data Mining.
all the stakeholders involved in an education system. Different feature selection algorithms are applied on this
Students and teachers can take necessary actions to data set and the results are obtained by Correlation Based
improve the results of those students whose result Feature Selection algorithm with 8 features. Then
prediction is not satisfactory. A training dataset of classification algorithms may be applied on this feature
students is taken to build the Naive Bayes model. The subset for predicting student grades [8]. Another work by
model is then applied on the test data to predict the end the same authors have discussed about Early Prediction
semester results of students. In this study, number of of Students Performance using Machine Learning
attributes is considered to predict the grade of a student Techniques. In this paper a set of attributes are first
[1]. defined. Then feature selection algorithms are applied on
Another work done by Tajunisha and Anjali have the data set to reduce the number of features. Five classed
discussed about Predicting Student Performance Using of Machine Learning Algorithm (MLA) are then applied
MapReduce. Authors introduced the MapReduce concept on this data set and it was found that the best results were
to improve the accuracy and reduce the time complexity. obtained with the decision tree class of algorithms [9].
In this work, the deadline constraint is also introduced.
Based on this, an extensional MapReduce Task III. METERIALS AND METHODS
Scheduling algorithm for Deadline constraints (MTSD) is
proposed. It allows user to specify a job’s (classification A feature selection algorithm can be seen as the
process in data mining) deadline and tries to make the job combination of a search technique for proposing new
to be finished before the deadline. Finally, the proposed feature subsets, along with an evaluation measure which
System has higher classification accuracy even in the big scores the different feature subsets. The simplest
data and it also reduced the time complexity [2]. Another algorithm is to test each possible subset of features
study focused on Predicting Students Final GPA Using finding the one which minimizes the error rate. The
Decision Trees by Mashael and Muna [3]. choice of evaluation metric heavily influences the
Authors applied the J48 decision tree algorithm to algorithm, and it is these evaluation metrics which
discover classification rules. They extracted useful distinguish between the three main categories of feature
knowledge and identified the most important courses in selection algorithms: wrappers, filters and embedded
the students study plan based on their grades in the methods. Wrapper methods use a predictive model to
mandatory courses. A work carried out by Karthikeyan score feature subsets. Filter methods use a proxy measure
and Thangaraju had proposed a work in genetic instead of the error rate to score a feature subset. This
algorithm and particle Swarm optimization search measure is chosen to be fast to compute, while still
techniques and correlation based feature selection is used capturing the usefulness of the feature set. Embedded
346
The 3rd International Conference on Small & Medium Business 2016
January 19 - 21, 2016, Nikko Saigon Hotel, Hochiminh, Vietnam
347
The 3rd International Conference on Small & Medium Business 2016
January 19 - 21, 2016, Nikko Saigon Hotel, Hochiminh, Vietnam
For the purpose of designing and evaluating our shows the results of applying two feature selection
experiments, we have used WEKA. It is open source algorithms.
software which is freely available for mining data and
implements a large collection of mining algorithm. It can Table 2: Best Selected Attributes
accept data in various formats and also has converter
supported with it. So we have converted the student Algorithm Attributes Selected
dataset into CSV file. Under the “Test options”, the 10- cfsSubsetEval Branch,SSG,FINC,PSM,GP,ATT
fold cross-validation is selected as our evaluation process. Age,branch,cat,SSG,
The various performance Metrics are discussed as GainRatio medium,ATT,GP,FINC,
follows. AttributeEval FQUAL,MQUAL,
The Accuracy of the predictive model is calculated HSG,SEM_P,LOC
based on the True positive rate, false positive rate, and
precision and recall values [14]. TP rate(True Positive): A. Results of cfsSubset Evaluator
A positive test results accurately reflects the test for In this experiment Correlation Based Feature
activity. If the outcome from a prediction is p, and the selection algorithm is used with 6 attributes along with
actual value is also p, then it is called true positive (TP). Naïve Bayes classifier was implemented on the data set
and the results are presented in Table 3. It shows that
TP = TP/P where P= (TP+FN) classification results for Naïve Bayes correctly classifies
about 84.2% for 10 fold cross validation. Also True
TN (True negative): It has occurred when both the Positive rate is high for the class Second and first,
prediction outcome and the actual value are n in the Whereas TP rate is very low for the class Third. Fig.1
number of input data. shows the graphical representation of the classifier.
348
The 3rd International Conference on Small & Medium Business 2016
January 19 - 21, 2016, Nikko Saigon Hotel, Hochiminh, Vietnam
Naïve Bayes
Algorithm Second Fail First Distinction Third Weighted
Avg.
cfsSEval 0.888 0.333 0.759 0.25 0 0.842
GRAE 0.873 0 0.764 0.143 0 0.744
349
The 3rd International Conference on Small & Medium Business 2016
January 19 - 21, 2016, Nikko Saigon Hotel, Hochiminh, Vietnam
350