Performance Evaluation of Feature Selection Algorithms in Educational Data Mining
Performance Evaluation of Feature Selection Algorithms in Educational Data Mining
Abstract: Educational Data mining(EDM)is a prominent field support decision making system. The main focus of this
concerned with developing methods for exploring the unique research work is to identify the best feature selection and
and increasingly large scale data that come from educational classification algorithms to examine a performance of
settings and using those methods to better understand students undergraduate student performance in education data set. The
in which they learn. It has been proved in various studies and objective is to find the best attribute by comparing the
by the previous study by the authors that data mining performance of various feature selection techniques in the
techniques find widespread applications in the educational prediction of students’ performance in the final semester
decision making process for improving the performance of examination using different classification algorithms such as
students in higher educational institutions. Classification J48, Naïve Bayes, Bayes Net, IBk, OneR, and JRip are used in
techniques assumes significant importance in the machine this research work. The idea behind this research work is to
learning tasks and are mostly employed in the prediction identify slow learners which help the faculties to give special
related problems. In machine learning problems, feature attention to individual student’s to improve their academic
selection techniques are used to reduce the attributes of the performance.
class variables by removing the redundant and irrelevant
features from the dataset. The aim of this research work is to A research work done by parneet kaur et al. in education
compares the performance of various feature selection sector. Their work focuses on identifying the slow learners
techniques is done using WEKA tool in the prediction of among students and applies feature selection algorithms to
students’ performance in the final semester examination using filter desired potential variables using WEKA tool. As a result,
different classification algorithms. Particularly J48, Naïve statistics are generated based on all classification algorithms in
Bayes, Bayes Net, IBk, OneR, and JRip are used in this order to predict the accuracy [*]. Another work by
research work. The dataset for the study were collected from HythemHashim et al. discussed about Data mining
the student’s performance report of a private college in Tamil methodologies to study student’s academic performance using
Nadu state of India. The effectiveness of various feature the C4.5 Algorithm. Their objective is to build a classification
selection algorithms was compared with six classifiers and the model that can be used to improve the student’s academic
results are discussed. The results of this study shows that the records in Faculty of Mathematical Science and Statistics. This
accuracy of IBK is 99.680% which is found to be high than model has been done using C4.5 for predicting student
other classifiers over the CFS subset evaluator. Also found that performance in many different settings [1].
overall accuracy of CFS subset evaluator seems to be high than
other feature selection algorithms. The future work will A work done by Vaibhav and Rajendra named as Classification
concentrate on the implementation of a proposed hybrid and performance evaluation using data mining algorithms. The
method by considering large dataset collected from many authors collected student data from polytechnique institute and
institutions. classified the data using Decision tree and Naïve Bayesian
algorithms. They compare results of classification with respect
Keywords: Educational data mining, Wrapper selection, Best to different performance parameters [2]. Another research done
First Search,Classification Algorithms, Feature selection by Anjana and Jeena discussed about Predicting College
Algorithms. Students Dropout using EDM Techniques. Here WEKA tool
has been used to evaluate the attributes. Various classification
I. INTRODUCTION techniques like induction rules and decision tree have been
applied to data and results of each of these approaches have
In educational data mining, prediction of students’ been compared [3]. A paper Titled “Performance
performance has long been an interesting area of research and Analysis and Prediction in Educational Data Mining: A
it helps to identify the weak students or students at risk. As the Research Travelogue” by Pooja et al. has been done towards
educational institutions are facing intense competition with the usage of data mining techniques in the field of education.
respect to admission, retention and sustenance, it is important This paper presents a comprehensive survey towards
that the institutions pay significant attention in improving educational data mining [4]. A work by Punlumjeak and
students output. Most often the institutions are judged by the Rachburee had proposed a comparison of feature selection
percentage of results produced by students’ in the finale end techniques namely genetic algorithms, support vector machine,
semester examination. A data mining system offer several information gain, minimum and maximum relevance
techniques to the educational leaders to support the decision algorithms with supervised classifiers such as naïve bayes,
making process to improve the quality of education. The large decision tree, k-nearest neighbour and neural network. Their
volume of data generated in the educational institutions can be results shows that minimum and maximum relevance feature
effectively used to draw rich source of vital information to selection method with 10 features give the best result on
131
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016 Page No.131-139
ISSN: 2278-2419
91.12% accuracy with k-nearest neighbour classifier[5]. Techniques Used for Educational System. This paper is based
Another work by Anal and Devadatta had applied a different on survey which proposes to apply data mining techniques
feature selection algorithm on the student data set. The best such as association rule mining, classification techniques
results are achieved by correlation based feature selection with [15].The classification is a data mining technique which
8 features. Subsequently classification algorithms may be includes systematic approach to building the classification
applied on this feature subset for predicting student grades models from an input dataset [16]. Some of the popular
[6].Komal and Supriya [7] have conducted a Survey on Mining classifiers used to solve a classification problem are decision
Educational Data to Forecast Failure of Engineering Students. tree classifiers, rule-based classifiers, neural networks, support
This paper provides a Review of the available literature on vector machines, and naive Bayes classifiers [17]. Therefore, a
Educational Data mining, Classification method and different key objective of the learning algorithm is to build a predictive
feature selection techniques that author should apply on model that accurately predicts the class labels of previously
Student dataset. The research paper titled Improvement on unknown records. This paper examines that various
Classification Models of Multiple Classes through Effectual classification algorithms and their performance are compared
Processes by Tarik [8]. using WEKA software and results are discussed. The open
source data mining tool WEKA was used in the present work
This paper work focuses on improving the results of to obtain the reduced set of features from the available feature
classification models of multiple classes via some effective set using various feature selection techniques. In addition, the
techniques. The collected data are pre-processed, cleaned, reduced attributes were given as input to the classifiers like
filtered, normalized, the final data was balanced and decision tree algorithm C4.5 (J48), Bayesian classifiers like
randomized, then a combining technique of Naïve Base Naïve Bayes and BayesNet, Nearest Neighbor algorithm (IBk)
Classifier and Best First Search algorithms are used to and rule learners (OneR and JRip) to evaluate the performance
ultimately reduce the number of features in data sets. Finally, a of the classification algorithms for the particular feature
multi-classification task is conducted through some effective selection technique.
classifiers such as K-Nearest Neighbor, Radial Basis Function,
and Artificial Neural Network to forecast the students’ This paper is structured as follows. Section 2 discusses about
performance. Another work carried out by Sadaf and Kulkarni background of the study. Section 3 describes various feature
discussed about Precognition of Students Academic Failure selection techniques used for reducing the attributes of the
Using Data Mining Techniques. This research paper proposes dataset. The statement of the problem is provided in Section 4.
to pre-recognize Student’s academic failure using various Data The details of the dataset generated for the study is presented
mining techniques especially induction rules, decision trees in the Section 5. The experimental evaluation and comparative
and naïve Bayes are applied [9]. analysis are given in Section 6 and Conclusion for the
proposed work is given in Section 7. Finally, vital references
Carlos et al. [10] have tried to attempt to solve this problem of are mentioned in Section 8.
predicting student’s academic failure using clustering
algorithms, induction rules or decision trees algorithms of data II. BACKGROUND
mining techniques. Authors applied five rules of induction
rules and five decision tree algorithms on the dataset. Sahil and Feature selection has been an important field of research in
Shweta have carried out a Study of Application of Data Mining data mining and machine learning systems. The primary
and Analytics in Education Domain. This paper basically is a objective of any feature selection technique is to choose a
study of certain research experiments which aims to study the subset of features of the input variables by eliminating those
different applications of data mining techniques on the features which are redundant, irrelevant orof no predictive
educational data. Also it elaborated upon the state of the art information [18]. Feature subset selection in machine learning
techniques in the field of educational analytics [11]. Ogunde can be broadly classified into three groups as filter, wrapper
and Ajibade have developed a new system for the prediction of and embedded models [19]. Filters based method of feature
students graduation grades based on entry results data. The selection depends on the general characteristics of the training
proposed system uses ID3 algorithm to classify the data and data. Thus, feature selection process is carried out as a pre-
construct the decision tree by employing a top-down, greedy processing step, independent of the learning algorithm.
search to test every attributes [12].Dinesh and Radika had done Wrapper technique depends on the learning algorithm and uses
a survey on predicting Student academic Performance in it as a black box to evaluate the usefulness of subsets of
educational environment which is based upon the variables in the prediction task. Thus, wrapper methods uses
psychological and environmental factor is predicted using learning algorithm to evaluate the subset of features for feature
different educational data mining techniques. Researchers also selection. Wrapper methods are computationally intensive.
survey the predictive model in data mining and current trends Embedded methods on the other hand perform feature selection
in prediction in data mining [13]. during the training process of the classifier. This methods are
particularly specific to a given learning machines.
A Work done by ArpitTrivedi has put forward a simple
approach for categorizing student data using decision tree As the dimensionality of a domain expands, the number of
based approach. For taking measures of category of specific features N increases. Finding an optimal feature subset is
student, a frequency measure is used as a feature extraction. intractable and problems related feature selections have been
With the use of trained classifier, they predicted the class for proved to be NP-hard. At this juncture, it is essential to
indefinite student automatically [14].A work has done by describe traditional feature selection process, which consists of
Agrawal and Gurav have done a review on Data Mining four basic steps, namely, subset generation, subset evaluation,
132
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016 Page No.131-139
ISSN: 2278-2419
stopping criterion, and validation. Subset generation is a search uncorrelated with each other. Irrelevant features should be
process that produces candidate feature subsets for evaluation ignored because they will have low correlation with the class.
based on a certain search strategy. Each candidate subset is Redundant features should be screened out as they will be
evaluated and compared with the previous best one according highly correlated with one or more of the remaining features.
to a certain evaluation. If the new subset turns to be better, it The acceptance of a feature will depend on the extent to which
replaces best one. This process is repeated until a given it predicts classes in areas of the instance space not already
stopping condition is satisfied [20]. A number of studies have predicted by other features.
established in theory and practice that feature selection is an
effective technique in improving learning efficiency, B. Best First Search Algorithm (BFS)
enhancing predictive accuracy and minimizing the complexity Best First Search is an approach that searches the attribute
of results in data mining system. The effectiveness of feature subsets space via a method of Greedy Hill Climbing improved
selection has been proved in many applications involving data with a backtracking aptitude. The controls of the amount of
mining and machine learning like text categorization [21], backtracking can be achieved via setting the quantity of
image retrieval [22], information retrieval [23], DNA consecutive non-improving nodes. This approach might start to
microarray analysis [24], intrusion detection [25,26], and search in both directions; forwardly or backwardly. It can start
music information retrieval [27]. with the empty set of attributes and search forwardly, or it can
start with the full set of attributes and search backwardly
III. STATEMENT OF THE PROBLEM [8].The Table 1 shows the Best first search algorithm [28].
In this research work, the performance of various feature Table 1: Best first search algorithm
selection algorithms was evaluated on different classification
algorithm using the students’ academic performance dataset 1. Begin with the OPEN list containing the start state,
generated for the study. The proposed study made several the CLOSED list empty, and BEST← start state.
comparisons to evaluate the effectiveness of the feature 2. Let s = arg max e(x) (get the state from OPEN with
selection techniques using the measures involving error and the highest evaluation).
accuracy parameters. The overall aim of the study was to 3. Remove s from OPEN and add to CLOSED.
analyze the effectiveness of various machine learning 4. If e(s) _ e (BEST), then BEST ← s.
algorithms to predict students’ performance in the end semester 5. For each child t of s that is not in the OPEN
examination. The dataset for the study included the or CLOSED list, evaluate and add to OPEN.
demographic details of the students like gender, family size 6. If BEST changed in the last set of expansions, goto 2.
and type, income, parent’s educational attainment and locality. 7. Return BEST.
In addition, pre-collegiate conditions of the students like their
performance in secondary and higher secondary classes are C. Wrapper Feature Selection
also collected and maintained in the colleges. Thus, it could be In the wrapper approach, the feature subset selection is done
useful to the educational leaders and management of the using the induction algorithm as a black box. The feature
colleges, if the features in the currently available data can be subset selection algorithm conducts a search for a good subset
acting as the indicator for predicting the performance of the using the induction algorithm itself as part of the evaluation
students. The major objective of this study is to analyze the function. The accuracy of the induced classifiers is estimated
student’s data available in the degree colleges to identify any using accuracy estimation techniques. Wrappers are based on
specific patterns that might be useful in the prediction of their hypothesis. They assign some values to weight vectors, and
performance in the university exams. The specific objective of compare the performance of a learning algorithm with different
the study is to classify students according to their performance weight vector. In wrapper method, the weights of features are
in the final examination based on their personal and pre- determined by how well the specific feature settings perform in
collegiate characteristics. classification learning. The algorithm iteratively adjusts feature
weights based on its performance [29].
IV. RESEARCH METHODOLOGY
D. CfsSubset Evaluator (CSER)
In this research work six classification algorithm are used such It evaluates the worth of a subset of attributes by considering
as J48, Naïve bayes, Bayes net, IBK, OneR and JRip along the individual predictive ability of each feature along with the
with four feature selection algorithms. In this section, the degree of redundancy between them [7].
fundamentals of some the feature selection algorithms are
illustrated. Furthermore, the algorithms CfsSubset evaluations, E. Chi-Squared Attribute Evaluator (CSAER)
Chi-Squared Attribute Evaluation, Information Gain Attribute ChiSquaredAttributeEval evaluates an attribute by computing
Evaluation and Relief attribute evaluation which are used in the value of the chi-squared statistic with respect to the class
this research work are also described. [7].
TP F- ROC
Classifier Precision Recall
Rate Measure Area
J48 0.907 0.913 0.907 0.907 0.935
Naïve 0.959 0.96 0.956 0.959 0.988
Bayes
Bayes 0.93 0.949 0.93 0.936 0.972
Net
IBk 0.953 0.953 0.953 0.952 0.97
Figure 2: Results of CfsSubset Evaluator 2
OneR 0.952 0.952 0.952 0.951 0.945
Table 11: Classification results for Chi-Square Attribute JRip 0.947 0.947 0.947 0.926 0.945
Evaluation
TP F- ROC
Classifier Precision Recall
Rate Measure Area
J48 0.865 0.875 0.865 0.85 0.878
Naïve 0.955 0.958 0.955 0.956 0.986
Bayes
Bayes Net 0.921 0.931 0.921 0.925 0.98
IBk 0.955 0.959 0.955 0.956 0.953
OneR 0.983 0.984 0.983 0.983 0.96
7
JRip 0.972 0.973 0.972 0.971 0.954
The graph shows that the classifier Naïve Bayes could attain a TP F- ROC
highest ROC value and Bayes Net had second highest value Classifier Precision Recall
Rate Measure Area
when it had 12 features. So, we deduce that Naïve Bayes has J48 0.977 0.977 0.977 0.976 0.955
the optimal dimensionality in the student data set. Among the Naïve
classification Algorithm OneR has the maximum F-measure 0.958 0.962 0.958 0.958 0.987
Bayes
value. Bayes
0.915 0.938 0.915 0.923 0.969
Net
137
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016 Page No.131-139
ISSN: 2278-2419
IBk 0.97 0.972 0.97 0.97 0.98
OneR 0.982 0.982 0.982 0.981 0.966
JRip 0.964 0.965 0.964 0.962 0.963
VIII. CONCLUSION
139