0% found this document useful (0 votes)
79 views9 pages

Performance Evaluation of Feature Selection Algorithms in Educational Data Mining

Educational Data mining(EDM)is a prominent field concerned with developing methods for exploring the unique and increasingly large scale data that come from educational settings and using those methods to better understand students in which they learn. It has been proved in various studies and by the previous study by the authors that data mining techniques find widespread applications in the educational decision making process for improving the performance of students in higher educational inst

Uploaded by

IIR india
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views9 pages

Performance Evaluation of Feature Selection Algorithms in Educational Data Mining

Educational Data mining(EDM)is a prominent field concerned with developing methods for exploring the unique and increasingly large scale data that come from educational settings and using those methods to better understand students in which they learn. It has been proved in various studies and by the previous study by the authors that data mining techniques find widespread applications in the educational decision making process for improving the performance of students in higher educational inst

Uploaded by

IIR india
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 9

Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications

Volume: 05 Issue: 02 December 2016 Page No.131-139


ISSN: 2278-2419

Performance Evaluation of Feature Selection


Algorithms in Educational Data Mining
T.Velmurugan1, C. Anuradha2
1
Associate Professor, PG and Research Dept. of Computer Science, D.G.Vaishnav College, Chennai, India.
2
Research Scholar, BharathiarUniversity, Coimbatore, India.
Email: [email protected];[email protected];

Abstract: Educational Data mining(EDM)is a prominent field support decision making system. The main focus of this
concerned with developing methods for exploring the unique research work is to identify the best feature selection and
and increasingly large scale data that come from educational classification algorithms to examine a performance of
settings and using those methods to better understand students undergraduate student performance in education data set. The
in which they learn. It has been proved in various studies and objective is to find the best attribute by comparing the
by the previous study by the authors that data mining performance of various feature selection techniques in the
techniques find widespread applications in the educational prediction of students’ performance in the final semester
decision making process for improving the performance of examination using different classification algorithms such as
students in higher educational institutions. Classification J48, Naïve Bayes, Bayes Net, IBk, OneR, and JRip are used in
techniques assumes significant importance in the machine this research work. The idea behind this research work is to
learning tasks and are mostly employed in the prediction identify slow learners which help the faculties to give special
related problems. In machine learning problems, feature attention to individual student’s to improve their academic
selection techniques are used to reduce the attributes of the performance.
class variables by removing the redundant and irrelevant
features from the dataset. The aim of this research work is to A research work done by parneet kaur et al. in education
compares the performance of various feature selection sector. Their work focuses on identifying the slow learners
techniques is done using WEKA tool in the prediction of among students and applies feature selection algorithms to
students’ performance in the final semester examination using filter desired potential variables using WEKA tool. As a result,
different classification algorithms. Particularly J48, Naïve statistics are generated based on all classification algorithms in
Bayes, Bayes Net, IBk, OneR, and JRip are used in this order to predict the accuracy [*]. Another work by
research work. The dataset for the study were collected from HythemHashim et al. discussed about Data mining
the student’s performance report of a private college in Tamil methodologies to study student’s academic performance using
Nadu state of India. The effectiveness of various feature the C4.5 Algorithm. Their objective is to build a classification
selection algorithms was compared with six classifiers and the model that can be used to improve the student’s academic
results are discussed. The results of this study shows that the records in Faculty of Mathematical Science and Statistics. This
accuracy of IBK is 99.680% which is found to be high than model has been done using C4.5 for predicting student
other classifiers over the CFS subset evaluator. Also found that performance in many different settings [1].
overall accuracy of CFS subset evaluator seems to be high than
other feature selection algorithms. The future work will A work done by Vaibhav and Rajendra named as Classification
concentrate on the implementation of a proposed hybrid and performance evaluation using data mining algorithms. The
method by considering large dataset collected from many authors collected student data from polytechnique institute and
institutions. classified the data using Decision tree and Naïve Bayesian
algorithms. They compare results of classification with respect
Keywords: Educational data mining, Wrapper selection, Best to different performance parameters [2]. Another research done
First Search,Classification Algorithms, Feature selection by Anjana and Jeena discussed about Predicting College
Algorithms. Students Dropout using EDM Techniques. Here WEKA tool
has been used to evaluate the attributes. Various classification
I. INTRODUCTION techniques like induction rules and decision tree have been
applied to data and results of each of these approaches have
In educational data mining, prediction of students’ been compared [3]. A paper Titled “Performance
performance has long been an interesting area of research and Analysis and Prediction in Educational Data Mining: A
it helps to identify the weak students or students at risk. As the Research Travelogue” by Pooja et al. has been done towards
educational institutions are facing intense competition with the usage of data mining techniques in the field of education.
respect to admission, retention and sustenance, it is important This paper presents a comprehensive survey towards
that the institutions pay significant attention in improving educational data mining [4]. A work by Punlumjeak and
students output. Most often the institutions are judged by the Rachburee had proposed a comparison of feature selection
percentage of results produced by students’ in the finale end techniques namely genetic algorithms, support vector machine,
semester examination. A data mining system offer several information gain, minimum and maximum relevance
techniques to the educational leaders to support the decision algorithms with supervised classifiers such as naïve bayes,
making process to improve the quality of education. The large decision tree, k-nearest neighbour and neural network. Their
volume of data generated in the educational institutions can be results shows that minimum and maximum relevance feature
effectively used to draw rich source of vital information to selection method with 10 features give the best result on
131
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016 Page No.131-139
ISSN: 2278-2419
91.12% accuracy with k-nearest neighbour classifier[5]. Techniques Used for Educational System. This paper is based
Another work by Anal and Devadatta had applied a different on survey which proposes to apply data mining techniques
feature selection algorithm on the student data set. The best such as association rule mining, classification techniques
results are achieved by correlation based feature selection with [15].The classification is a data mining technique which
8 features. Subsequently classification algorithms may be includes systematic approach to building the classification
applied on this feature subset for predicting student grades models from an input dataset [16]. Some of the popular
[6].Komal and Supriya [7] have conducted a Survey on Mining classifiers used to solve a classification problem are decision
Educational Data to Forecast Failure of Engineering Students. tree classifiers, rule-based classifiers, neural networks, support
This paper provides a Review of the available literature on vector machines, and naive Bayes classifiers [17]. Therefore, a
Educational Data mining, Classification method and different key objective of the learning algorithm is to build a predictive
feature selection techniques that author should apply on model that accurately predicts the class labels of previously
Student dataset. The research paper titled Improvement on unknown records. This paper examines that various
Classification Models of Multiple Classes through Effectual classification algorithms and their performance are compared
Processes by Tarik [8]. using WEKA software and results are discussed. The open
source data mining tool WEKA was used in the present work
This paper work focuses on improving the results of to obtain the reduced set of features from the available feature
classification models of multiple classes via some effective set using various feature selection techniques. In addition, the
techniques. The collected data are pre-processed, cleaned, reduced attributes were given as input to the classifiers like
filtered, normalized, the final data was balanced and decision tree algorithm C4.5 (J48), Bayesian classifiers like
randomized, then a combining technique of Naïve Base Naïve Bayes and BayesNet, Nearest Neighbor algorithm (IBk)
Classifier and Best First Search algorithms are used to and rule learners (OneR and JRip) to evaluate the performance
ultimately reduce the number of features in data sets. Finally, a of the classification algorithms for the particular feature
multi-classification task is conducted through some effective selection technique.
classifiers such as K-Nearest Neighbor, Radial Basis Function,
and Artificial Neural Network to forecast the students’ This paper is structured as follows. Section 2 discusses about
performance. Another work carried out by Sadaf and Kulkarni background of the study. Section 3 describes various feature
discussed about Precognition of Students Academic Failure selection techniques used for reducing the attributes of the
Using Data Mining Techniques. This research paper proposes dataset. The statement of the problem is provided in Section 4.
to pre-recognize Student’s academic failure using various Data The details of the dataset generated for the study is presented
mining techniques especially induction rules, decision trees in the Section 5. The experimental evaluation and comparative
and naïve Bayes are applied [9]. analysis are given in Section 6 and Conclusion for the
proposed work is given in Section 7. Finally, vital references
Carlos et al. [10] have tried to attempt to solve this problem of are mentioned in Section 8.
predicting student’s academic failure using clustering
algorithms, induction rules or decision trees algorithms of data II. BACKGROUND
mining techniques. Authors applied five rules of induction
rules and five decision tree algorithms on the dataset. Sahil and Feature selection has been an important field of research in
Shweta have carried out a Study of Application of Data Mining data mining and machine learning systems. The primary
and Analytics in Education Domain. This paper basically is a objective of any feature selection technique is to choose a
study of certain research experiments which aims to study the subset of features of the input variables by eliminating those
different applications of data mining techniques on the features which are redundant, irrelevant orof no predictive
educational data. Also it elaborated upon the state of the art information [18]. Feature subset selection in machine learning
techniques in the field of educational analytics [11]. Ogunde can be broadly classified into three groups as filter, wrapper
and Ajibade have developed a new system for the prediction of and embedded models [19]. Filters based method of feature
students graduation grades based on entry results data. The selection depends on the general characteristics of the training
proposed system uses ID3 algorithm to classify the data and data. Thus, feature selection process is carried out as a pre-
construct the decision tree by employing a top-down, greedy processing step, independent of the learning algorithm.
search to test every attributes [12].Dinesh and Radika had done Wrapper technique depends on the learning algorithm and uses
a survey on predicting Student academic Performance in it as a black box to evaluate the usefulness of subsets of
educational environment which is based upon the variables in the prediction task. Thus, wrapper methods uses
psychological and environmental factor is predicted using learning algorithm to evaluate the subset of features for feature
different educational data mining techniques. Researchers also selection. Wrapper methods are computationally intensive.
survey the predictive model in data mining and current trends Embedded methods on the other hand perform feature selection
in prediction in data mining [13]. during the training process of the classifier. This methods are
particularly specific to a given learning machines.
A Work done by ArpitTrivedi has put forward a simple
approach for categorizing student data using decision tree As the dimensionality of a domain expands, the number of
based approach. For taking measures of category of specific features N increases. Finding an optimal feature subset is
student, a frequency measure is used as a feature extraction. intractable and problems related feature selections have been
With the use of trained classifier, they predicted the class for proved to be NP-hard. At this juncture, it is essential to
indefinite student automatically [14].A work has done by describe traditional feature selection process, which consists of
Agrawal and Gurav have done a review on Data Mining four basic steps, namely, subset generation, subset evaluation,

132
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016 Page No.131-139
ISSN: 2278-2419
stopping criterion, and validation. Subset generation is a search uncorrelated with each other. Irrelevant features should be
process that produces candidate feature subsets for evaluation ignored because they will have low correlation with the class.
based on a certain search strategy. Each candidate subset is Redundant features should be screened out as they will be
evaluated and compared with the previous best one according highly correlated with one or more of the remaining features.
to a certain evaluation. If the new subset turns to be better, it The acceptance of a feature will depend on the extent to which
replaces best one. This process is repeated until a given it predicts classes in areas of the instance space not already
stopping condition is satisfied [20]. A number of studies have predicted by other features.
established in theory and practice that feature selection is an
effective technique in improving learning efficiency, B. Best First Search Algorithm (BFS)
enhancing predictive accuracy and minimizing the complexity Best First Search is an approach that searches the attribute
of results in data mining system. The effectiveness of feature subsets space via a method of Greedy Hill Climbing improved
selection has been proved in many applications involving data with a backtracking aptitude. The controls of the amount of
mining and machine learning like text categorization [21], backtracking can be achieved via setting the quantity of
image retrieval [22], information retrieval [23], DNA consecutive non-improving nodes. This approach might start to
microarray analysis [24], intrusion detection [25,26], and search in both directions; forwardly or backwardly. It can start
music information retrieval [27]. with the empty set of attributes and search forwardly, or it can
start with the full set of attributes and search backwardly
III. STATEMENT OF THE PROBLEM [8].The Table 1 shows the Best first search algorithm [28].

In this research work, the performance of various feature Table 1: Best first search algorithm
selection algorithms was evaluated on different classification
algorithm using the students’ academic performance dataset 1. Begin with the OPEN list containing the start state,
generated for the study. The proposed study made several the CLOSED list empty, and BEST← start state.
comparisons to evaluate the effectiveness of the feature 2. Let s = arg max e(x) (get the state from OPEN with
selection techniques using the measures involving error and the highest evaluation).
accuracy parameters. The overall aim of the study was to 3. Remove s from OPEN and add to CLOSED.
analyze the effectiveness of various machine learning 4. If e(s) _ e (BEST), then BEST ← s.
algorithms to predict students’ performance in the end semester 5. For each child t of s that is not in the OPEN
examination. The dataset for the study included the or CLOSED list, evaluate and add to OPEN.
demographic details of the students like gender, family size 6. If BEST changed in the last set of expansions, goto 2.
and type, income, parent’s educational attainment and locality. 7. Return BEST.
In addition, pre-collegiate conditions of the students like their
performance in secondary and higher secondary classes are C. Wrapper Feature Selection
also collected and maintained in the colleges. Thus, it could be In the wrapper approach, the feature subset selection is done
useful to the educational leaders and management of the using the induction algorithm as a black box. The feature
colleges, if the features in the currently available data can be subset selection algorithm conducts a search for a good subset
acting as the indicator for predicting the performance of the using the induction algorithm itself as part of the evaluation
students. The major objective of this study is to analyze the function. The accuracy of the induced classifiers is estimated
student’s data available in the degree colleges to identify any using accuracy estimation techniques. Wrappers are based on
specific patterns that might be useful in the prediction of their hypothesis. They assign some values to weight vectors, and
performance in the university exams. The specific objective of compare the performance of a learning algorithm with different
the study is to classify students according to their performance weight vector. In wrapper method, the weights of features are
in the final examination based on their personal and pre- determined by how well the specific feature settings perform in
collegiate characteristics. classification learning. The algorithm iteratively adjusts feature
weights based on its performance [29].
IV. RESEARCH METHODOLOGY
D. CfsSubset Evaluator (CSER)
In this research work six classification algorithm are used such It evaluates the worth of a subset of attributes by considering
as J48, Naïve bayes, Bayes net, IBK, OneR and JRip along the individual predictive ability of each feature along with the
with four feature selection algorithms. In this section, the degree of redundancy between them [7].
fundamentals of some the feature selection algorithms are
illustrated. Furthermore, the algorithms CfsSubset evaluations, E. Chi-Squared Attribute Evaluator (CSAER)
Chi-Squared Attribute Evaluation, Information Gain Attribute ChiSquaredAttributeEval evaluates an attribute by computing
Evaluation and Relief attribute evaluation which are used in the value of the chi-squared statistic with respect to the class
this research work are also described. [7].

A. Correlation-based Feature Selection (CFS) F. Information Gain Attribute Evaluator (IGAER)


Correlation based Feature Selection (CFS) is a simple filter It Evaluates an attribute by measuring the information gain
algorithm that ranks feature subsets according to a correlation with respect to the class Info Gain (Class, Attribute) = H
based heuristic evaluation function [28]. In CFS, the bias of the (Class) - H (Class | Attribute) [7].
evaluation function is toward subsets that contain features that
are highly correlated with the output to be predicted and G. Relief Attribute Evaluator (RAER)
133
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016 Page No.131-139
ISSN: 2278-2419
It Evaluates the worth of an attribute by repeatedly sampling an missing values. Counts are
instance and considering the value of the given attribute for the distributed over other
nearest instance of the same and different class can operate on values in proportion to
both discrete and continuous class data [7]. their frequency. Or else,
Table 2: Description of options and Capability of CfsSubset missing is treated as a
evaluator separate value.
Capability Supported
Option Description Class Missing class values,
Identify locally predictive nominal class, Binary class
locally Predictive attributes. Iteratively adds Attributes Empty nominal attributed,
attributes with the highest mutual Nominal attributes, Numeric
relationship with the class as long Attributes, Unary attributes,
as there is not already an attribute Date attributed, Binary
in the subset that has a higher attributes, Missing Values
correlation with the attribute in Min # of instances 1
question
missingSeperate Take missing as a separate value. Table 5: Description of options and Capability of Relief
Capability Supported Attribute Evaluation

Class Missing class values, Numeric Option Description


class, nominal class, Date class, numNeighbors Number of nearest neighbors for
Binary class attribute estimation
Attributes Empty nominal attributed, sample Size Number of instances to sample.
Nominal attributes, Numeric Default (-1) indicates that all
Attributes, Unary attributes, Date instances will be used for
attributed, Binary attributes, attribute estimation.
Missing Values Seed Random seed for sampling
Min # of instances 1 instances
Sigma Set influence of nearest
Table 3: Description of options and Capability of Chi-Squared neighbors. Used in an exp
Attribute Evaluator function to control how quickly
weights decrease for more
Option Description distant instances. Use in
conjunction with weight By
BinarizeNumericAttribut Only binarize numeric Distance. Sensible values = 1/5
es attributes instead of properly to 1/10 the number of nearest
discretizing them. neighbors.
Missing Merge Distribute the counts for WeightByDistance Weight nearest neighbors by
missing values. Then counts their distance
are distributed across other Capability Supported
values in proportion to their Class Nominal class, Date class,
frequency. Or else, missing is Missing class values,
treated as a separate value. numericclass, Binary class
Capability Supported Attributes Empty nominal attributes,
Class Missing class values, nominal Nominal attributes, Numeric
class, Binary class Attributes, Unary attributes, Date
Attributes Empty nominal attributed, attributed, Binary attributes,
Nominal attributes, Numeric Missing Values
Attributes, Unary attributes, Min # of instances 1
Date attributed, Binary
attributes, Missing Values V. EXPERIMENTAL DATA
Min # of instances 1
A student’s dataset was generated based on the demographic
Table 4: Description of options and Capability of Info Gain characteristics, student’s admission dataand pre-collegiate
Attribute Evaluation features of the students. In addition, performance related
measures were also gathered based on class and university
Option Description examinations. The study data mining classification algorithms
Just binarize numeric that are compared in the study includes Naive Bayes, Bayes
binarizeNumericAttributes attributes rather than Net Classifiers [30], and OneR, J48 decision tree algorithm
properly discretizing them which is an open source Java implementation of C4.5
Distribute the counts for algorithm [31], IBK, JRip algorithm [32] and J48 algorithms.
Missing Merge The information used in this study was collected from college
134
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016 Page No.131-139
ISSN: 2278-2419
students enrolled in Bachelor degree program at a 3 reputed UG, PG, Ph.D}
arts and Science College in the state of Tamil Nadu affiliated MQual Mother’s {no-education,
to Thiruvalluvar University in the year 2014.The total number Qualification elementary,
of student’s data was 610 students with 21 attributes were secondary, UG, PG, Ph.D.
collected through questionnaire. The collected data was NA}
organized in Microsoft Excel sheet. PSM Previous {First > 60%, Second >45
Semester Mark &<60%, Third >36
The target variable was Student End Semester Marks (ESM) &<45%
which was usually in numeric form in terms of percentage. It Fail < 36%}
was discretized using pre-processing filters into 4 categories. CTG Class Test Grade {Poor, Average, Good}
The categories of target variable included First Class (Score > SEM_P Seminar {Poor , Average, Good}
60%), Second Class (45 - 60%), Third Class (36 - 45%), Fail Performance
(< 36%). Each student record had the following attributes ASS Assignment {Yes, No}
based on student personal data included gender, category of GP General {Yes, No}
admission, living location, family size, and family type, annual Proficiency
income of the family, father’s qualification and mother’s ATT Attendance {Poor , Average, Good}
qualification. The attributes referring to the students’ pre- LW Lab Work {Yes, No}
college characteristics includedStudents Grade in High School ESM End Semester {First > 60% , Second
and Students Grade in Senior Secondary School. The attributes Marks >45 &<60% , Third >36
describing other college features include thebranch of study of &<45%,
the students, place of stay, previous semester mark, class test Fail < 36%}
performance, seminar performance, assignment, general
proficiency, class attendance and performance in the laboratory
VI. EXPERIMENTAL SETUP
work. The study was limited to student’s data collected from
three Arts and Science Colleges in Tamil Nadu. The detailed
The main objective of this research is to study the impact of
description of the dataset is provided in Table 6.
feature selection techniques on the classification task so that
classification performance can be improved in the prediction of
Table 6: Description of the attributes used for
student performance for the student performance dataset
Classification
generated in the study. The classification model was built using
different algorithms like Naive Bayes, BayesNet, OneR, IBK,
Variables Description Possible Values JRip and J48. The WEKA application was used for this
Gender Students Sex {Male, Female} purpose. Each classifier is applied for two testing options -
Branch Students Branch {BCA, B.SC, B.COM, cross validation (using 10 folds and applying the algorithm 10
B.A} times - each time 9 of the folds are used for training and 1 fold
Cat Students category {BC, MBC, MSC, OC, is used for testing) and percentage split (2/3 of the dataset used
SBC, SC} for training and 1/3 – for testing).The Feature selection
HSG Students grade in {O – 90% -100%, A – algorithm tries to select those attributes which have greater
High School 80% - 89%, B – 70% - impact on their academic status. Feature Selection Algorithms
79%, used in this study are as follows: CfsSubsetEval, Chi-
C – 60% - 69%, D – 50% SquaredAttributeEval, InfoGainAttributeEval, and
- 59%, E – 35% - 49%, ReliefAttributeEval. Table 7 shows the best attributes that have
FAIL - <35%} selected by Feature Selection Algorithms using WEKA
SSG Students grade in {O – 90% -100%, A – software tool.
Senior Secondary 80% - 89%, B – 70% -
79%, Table 7: Reduction of Attributes using Feature selection
C – 60% - 69%, D – 50% algorithm
- 59%, E – 35% - 49%,
FAIL - <35% } Feature subset Attributes No. of
Medium Medium of Tamil, English, others Algorithm Attributes
instruction Sex, Branch, Cat,
LLoc Living Location {Village, Taluk, Rural, SSG_Grade,
of Student Town, District} HSG_Grade,
HOS Student stay in {Yes, No} Without Feature Medium, LOC, HOS,
hostel or not Selection FSIZE, FTYPE, 21
FSize student’s family {1, 2, 3, >3} Algorithms FINC, FQUAL,
size MQUAL, PSM,
FType Students family {Joint, Individual} CTG, SEM_P, ASS,
type GP, ATT, LW, ESM
FINC Family annual {poor, medium, high} Branch, SSG_Grade,
income CSER FINC, PSM, GP, 7
FQual Fathers {no-education, ATT, ESM
qualification elementary, secondary, CSAER PSM, Branch, FINC, 12
135
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016 Page No.131-139
ISSN: 2278-2419
ATT, SSG_Grade, In the present study, Classifiers was implemented Without
LW, FSTAT, CTG, Feature Selection algorithm (WFS) applied on the data set and
Medium, GP, Sex, results of the classifiers shown in Table 9. The results reveal
ESM that the True Positive rate is high for the IBK and OneR, while
PSM, Branch, FINC, it is low for the classifier Bayes Net. The Precision is high for
LW, ATT, FSTAT, OneR classifier and it is very low the classifier IBK.
IGAER 10
SSG_Grade,
Medium, CTG, ESM Table 9: Classification results for WFS
Branch, PSM, FINC,
RAER LW, Medium, CTG, 8 TP F- ROC
Classifier Precision
FSTAT, ESM Rate Measure Area
J48 0.95 0.95 0.944 0.969
VII. RESULTS AND DISCUSSIONS Naïve 0.98 0.03 0.943 0.998
Bayes
The Performance of this model is highly depends on selection Bayes 0.94 0.953 0.924 0.987
of best Attributes from the list of attribute used in student data Net
set used in student data set. The present investigation focuses IBk 0.985 0.029 0.953 0.98
on different Feature Selection Algorithm used in data OneR 0.985 0.985 0.915 0.971
preprocessing. Effectiveness of the algorithm is presented in JRip 0.97 0.971 0.939 0.745
terms of different measures. For assessing the goodness here
Receiver Operating Characteristics (ROC) value can be
used.ROC value is the representation of the tradeoff between
the false positive and false negative rates. F-Measure, which is
another measure for evaluating the effectiveness, is the
harmonic mean of the precision and recall. The evaluation
measures with variations of ROC values and F-Measure are
generated from an Open Source Data mining tool WEKA.

Average F-measure was computed for feature selection


techniques for each of the classification algorithms. F-measure
determines the predictive accuracy of the classifier. Ten-fold
cross validation was used for this purpose. The results are
summarized in Table 8. The average F-measure was also
calculated for without performing feature selection. The results
clearly show that CSER (0.972) has outperformed other Figure 1: Results of WFS
techniques using 7 attributes. RAER has also produced better
F-Measure (0.961) with 8 attributes. B. CfsSubsetEval with Best First Search Algorithm (7
attributes)
Table 8: Average F-Measure for each feature subset The present study implements CfsSubset Evaluator has been
implemented and results are presented in Table 10. The Table
Feature F – Measure Avera presents that IBK classifier correctly classifies about 99.680%
Subset Naiv Bay One IBK JRip J48 ge for the 10-fold cross-validation testing. It also shows that the
e es R F- True positive rate is high for Bayes Net. The Precision value is
Bay Net Measu high for Naïve Bayes and low for J48.
es re
Withou 0.94 0.92 0.91 0.95 0.93 0.94 0.936 Table 10: Classification results for the CfsSubset Evaluator
t 3 4 5 3 9 4
feature TP F- ROC
Classifier Precision Recall
Selecti Rate Measure Area
on J48 0.965 0.96 0.965 0.909 0.968
CSER 0.99 0.96 0.98 0.99 0.98 0.90 0.972 Naïve 0.986 0.998 0.996 0.996 0.986
6 9 0 7 4 9 Bayes
CSAE 0.95 0.92 0.98 0.95 0.97 0.85 0.940 Bayes 0.997 0.996 0.967 0.969 0.987
R 6 5 3 6 1 Net
IGAE 0.95 0.93 0.95 0.95 0.94 0.90 0.941 IBk 0.99 0.99 0.99 0.997 0.984
R 9 6 1 2 6 7 OneR 0.980 0.980 0.980 0.980 0.983
RAER 0.95 0.92 0.98 0.97 0.96 0.97 0.961 JRip 0.985 0.985 0.985 0.984 0.987
8 3 1 2 6
From the Fig.3, we observe that the classifier Bayes Net and
JRip has the highest ROC value of 0.987 when it had 7
A. Without Feature Selection Algorithm (21 Attributes) attributes. The generated macro-averaged F-measure could
attain a maximum of 0.997 for the classifier IBk.
136
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016 Page No.131-139
ISSN: 2278-2419
D. Information Gain Attribute Evaluator and Ranker (10
C. Chi-SquareAttributeEval and Ranker (12 Attributes) Attributes)
The present study implements Chi-Square Attribute Evaluator The present study implements Information gain Attribute
has presented in Table 11. The results from Table 6 reveal that evaluator on the data set in the WEKA environment and the
the True Positive rate and precision is high for OneR classifier results are shown in Table 12. Feature selection algorithm
and low for J48. It can be verified that OneR classifier Information Gain Attribute Evaluation found that Naïve Bayes
correctly classifies about 98.314% for the 10-fold cross- classifier correctly classifies about 95.930% for the 10 fold
validation testing. cross-validation testing. The results from Table 12 also reveal
that the True Positive Rate and precision value is high for
Naïve Bayes and while it is low for J48.

Table 12: Classification results for InfoGain Attribute


Evaluation

TP F- ROC
Classifier Precision Recall
Rate Measure Area
J48 0.907 0.913 0.907 0.907 0.935
Naïve 0.959 0.96 0.956 0.959 0.988
Bayes
Bayes 0.93 0.949 0.93 0.936 0.972
Net
IBk 0.953 0.953 0.953 0.952 0.97
Figure 2: Results of CfsSubset Evaluator 2
OneR 0.952 0.952 0.952 0.951 0.945
Table 11: Classification results for Chi-Square Attribute JRip 0.947 0.947 0.947 0.926 0.945
Evaluation

TP F- ROC
Classifier Precision Recall
Rate Measure Area
J48 0.865 0.875 0.865 0.85 0.878
Naïve 0.955 0.958 0.955 0.956 0.986
Bayes
Bayes Net 0.921 0.931 0.921 0.925 0.98
IBk 0.955 0.959 0.955 0.956 0.953
OneR 0.983 0.984 0.983 0.983 0.96
7
JRip 0.972 0.973 0.972 0.971 0.954

Figure 4: Results of InfoGain Attribute Evaluator

We observer from Fig.5, that classifier Naïve Bayes has


highest ROC value and F-measure value found to be maximum
in Naïve Bayes with 10 attributes. Therefore, Naïve Bayes can
achieve relatively good performance on classification tasks.

E. Relief Attribute Evaluation and Ranker (8 Attributes)


The result from the Table 13 shows that Classifier OneR
correctly classifies about 98.181% for the 10-fold Cross-
validation testing on the data set and also True positive rate is
high. It also presents that Precision and True Positive Rate is
low for Bayes Net.
Figure 3: Results of Chi-Square Attribute Evaluation Table 13: Classification results for Relief Attribute Evaluation

The graph shows that the classifier Naïve Bayes could attain a TP F- ROC
highest ROC value and Bayes Net had second highest value Classifier Precision Recall
Rate Measure Area
when it had 12 features. So, we deduce that Naïve Bayes has J48 0.977 0.977 0.977 0.976 0.955
the optimal dimensionality in the student data set. Among the Naïve
classification Algorithm OneR has the maximum F-measure 0.958 0.962 0.958 0.958 0.987
Bayes
value. Bayes
0.915 0.938 0.915 0.923 0.969
Net
137
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016 Page No.131-139
ISSN: 2278-2419
IBk 0.97 0.972 0.97 0.97 0.98
OneR 0.982 0.982 0.982 0.981 0.966
JRip 0.964 0.965 0.964 0.962 0.963

Graphical Representation of Table 13 is shown in fig. 6. We


observe that Naïve Bayes could attain the maximum ROC
value of 0.987 when it had 8 features. And F-measures are
highest for the classifier OneR.

Figure 6: Performance of classifiers over a Feature Selection


Algorithms

VIII. CONCLUSION

The research work aims at analysing the impact of feature


selection techniques on the classification task using feature
selection algorithms WFS, CSER, CSAER, IGAER and RAER
and implementing on the student’s dataset collected from three
arts and science colleges. The analysis done on the resultant
Figure 5: Results of Relief Attribute Evaluation reduced data sets yields faster than models built with no feature
selection. The main concentration of this research work is to
The results of the comparative study of six different classifiers classify the student’s performance in the end semester
carried out against feature subsets generated by the four examination based on their results and personal characteristics.
different Feature selection Algorithms are shown below in the In this paper, it is applied various rank based feature selection
form of Table and Graph. In Table 14, we observe that IBK filters to data sets to identify the best feature selection
comes up with quite good rules for characterizing the structure algorithm. Based on the results obtained, the performance of
in data set. IBK has a highest Accuracy (99.680) over a feature selection algorithms CFS Subset Evaluator was found
CfsSubsetEval(CSER). Also Naïve Bayes shows a second to be better than the performance of other three feature
highest Accuracy (98.569) among the six Classifiers. selection algorithms. Also, among the classification
Furthermore CfsSubsetEval has the highest accuracy for all the algorithms, the work identify that IBK algorithm yields
six classifiers than the Chi-Square, InfoGain and Relief 99.68% which better than Naive Bayes, BayesNet, OneR and
Attribute Evaluator. J48 algorithms. The future work will concentrate on the
implementation of a proposed hybrid method by considering
Table 14: Accuracy of Classifiers over Feature Selection large dataset collected from many institutions.
Algorithms
References
Classifier WFS CSER CSAER IGAER RAER [1] Parneet Kaur,Manpreet Singh,Gurpreet Singh Josan,
“Classification and prediction based data mining
J48 94.974 96.515 86.516 90.697 90.674
algorithms to predict slow learners in education sector”,
Naïve 97.989 98.569 95.505 95.930 95.757
3rd Int. Conf. on Recent Trends in Computing, Vol 57,
Bayes
2015, pp. 500-508.
Bayes Net 93.969 97.670 92.134 93.023 91.515
[2] HythemHashim, Ahmed A, Talab, Ali Satty, Samani A,
IBk 98.492 99.680 95.405 95.255 96.969 Talab, “Data Mining Methodologies To Study Students
OneR 98.442 98.112 98.314 95.254 98.181 Academic Performance Using the C4.5 Algorithm”, Int.
JRip 96.984 98.478 97.191 94.674 96.363 Journal on Computational Sciences & Applications,
Vol.5, No.2, 2015, pp. 59-68.
[3] Vaibhav P. Vasani, Rajendra D. Gawali, “Classification
and performance evaluation using data mining
algorithm”, Int. Journal of Innovative Research in
Science, Engineering and Technology, Vol. 3, Issue 3,
2014, pp. 10453-10458.
[4] AnjanaPradeep and Jeena Thomas, ”Predicting College
Students Dropout using EDM Techniques”, Int. Journal
of Computer Applications, Vol.123,No.5,2015, pp. 26-
34.
[5] PoojaThakar, Anil Mehta, Manisha,”Performance
Analysis and Prediction in Educational Data Mining: A
138
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 05 Issue: 02 December 2016 Page No.131-139
ISSN: 2278-2419
Research Travelogue”, Int. Journal of Computer [21] Ramaswami. M and Bhaskaran. R, “A Study on Feature
Applications, Vol. 110, No.15, Jan 2015, pp.60-68. Selection Techniques in Educational Data Mining”,
[6] Punlumjeak W, Rachburee N, “A Comparative study of Journal of Computing, Vol.1, Issue 1, 2009,pp.7-11.
feature selection techniques for classify student [22] Forman, George, "An extensive empirical study of
performance”, 7th Int. Conf. on Information Technology feature selection metrics for text classification",The
and Electrical Engineering, 2015, pp.425-429. Journalof machine learning research, Vol.3, 2003, pp.
[7] Anal Acharya, Devadatta Sinha, “Application of Feature 1289-1305.
Selection Methods in Educational Data Mining”, Int. [23] Dy, Jennifer G., et al., "Unsupervised feature selection
Journal of Computer Applications, Vol.103, applied to content-based retrieval of lung image", Pattern
No.2,2014,pp.34-38. Analysis and Machine Intelligence, IEEE Transactions on
[8] Komal S,Sahedani, B supriya Reddy, “A Review: Mining Vol.25, No.3, 2003, pp. 373-378.
Educational Data to Forecast Failure of Engineering [24] Egozi, Ofer, EvgeniyGabrilovich, and Shaul Markovitch,
Students”, Int. Journal of Advanced Research in "Concept-Based Feature Generation and Selection for
Computer Science and Software Engineering, Vol.3, Information Retrieval", AAAI, Vol. 8, 2008, pp. 1132-
Issue 12, 2013, pp. 628-635. 1137.
[9] Tarik A. Rashid,” Improving on Classification Models of [25] Yu, Lei, and Huan Liu, "Redundancy based feature
Multiple Classes through Effectual Processes”, Int. selection for microarray data", Proceedings of the tenth
Journal of Advanced Computer Science and ACM SIGKDD international conference on Knowledge
Applications, Vol.6, No.7, 2015, pp.55-62. discovery and data mining, ACM, 2004, pp. 737-742.
[10] Sadaf Fatima salim Attar, Y.C. Kulkarni, “ Precognition [26] Bolon-Canedo, Veronica, Noelia Sanchez-Marono, and
of Students Academic Failure Using Data Mining Amparo Alonso-Betanzos, "Feature selection and
Techniques”, Int. Journal of Advanced Research in classification in multiple class datasets: An application to
Computer and Communication Engineering, Vol.4, Issue KDD Cup 99 dataset", Expert Systems with Applications,
5, 2015, pp. 109-115. Vol. 38, No.5, 2011, pp. 5947-5957.
[11] Carlos Marquez-Vera, Cristobal Romero Morales, and [27] Lee, Wenke, Salvatore J. Stolfo, and Kui W. Mok,
Sebastian Ventura Soto, “Predicting school failure and "Adaptive intrusion detection: A data mining approach",
dropout by using data mining techniques”, IEEE Journal Artificial Intelligence Review, Vol. 14, No.6, 2000, pp.
of Latin-American Learning Technologies, Vol. 8, No. 533-567.
1,2013, pp. 7-14. [28] Saari, Pasi, TuomasEerola, and Olivier Lartillot,
[12] Sahil P. Karkhanis, Shweta S. Dumbre, “A Study of "Generalizability and simplicity as criteria in feature
Application of Data Mining and Analytics in Education selection: Application to mood classification in music",
Domain”, Int. Journal of Computer Applications, Audio, Speech, and Language Processing, IEEE
Vol.120, No.22, 2015, pp. 23-28. Transactions, Vol. 19, No. 6, 2011, pp.1802-1812.
[13] Ogunde A.O and Ajibade D.A,”A Data Mining System [29] Mark A. Hall, “Correlation-based Feature Selection for
for Predicting University Students Graduation Grades Machine Learning”, PhD Thesis, The University of
Using ID3 Decision Tree Algorithm”, Journal of Waikato, 1999.
Computer Science and Information Technology, Vol. 2, [30] Mehdi Naseriparsa, Amir-MasoudBidgoli and
Issue 1, 2014, pp. 21-46. Tourajvaraee, “A Hybrid Feature Selection Method to
[14] Dinesh Kumar.A,Radhika.V, “A Survey on Predicting Improve Performance of a Group of Classification
Student Performance”, Int. Journal of Computer Science Algorithms”, Int. Journal of Computer Applications,
and Information Technologies, Vol.5, Vol.69, No. 17, pp. 28-35.
No.5,2014,pp.6147-6149. [31] Russel, Stuart, and Peter Norvig, "Artificial Intelligence:
[15] ArpitTrivedi, “Evaluation of Student Classification Based A Modern Approach ",EUA: Prentice Hall, 2003.
On Decision Tree”, Int.Journal of Advanced Research in [32] Quinlan, J. Ross,C4. 5: Programs for machine learning,
Computer Science and Software Engineering, Vol. 4, Elsevier, 2014.
Issue 2, 2014, pp. 111-112. [33] Cohen, William W, "Fast effective rule induction",
[16] AgrawalBhawana D, GuravBharti B, “Review on Data Proceedings of the twelth international conference on
Mining Techniques Used For Educational System”, Int. machine learning, 1995, pp.115-123.
Journal of Emerging Technology and Advanced
Engineering, Vol. 4, Issue 11, 2014, pp. 325-329.
[17] Cao, Longbing, Data mining and multi-agent integration,
Springer Science & Business Media, 2009.
[18] Tan, Pang-Ning, Michael Steinbach, and VipinKumar,
“Introduction to data mining”, Boston: Pearson Addison
Wesley, Vol. 1, 2006.
[19] Koller, Daphne, and MehranSahami, "Toward optimal
feature selection", 1996.
[20] Guyon, Isabelle, and André Elisseeff, "An introduction to
feature extraction", Feature extraction, Springer Berlin
Heidelberg, 2006, pp.1-25.

139

You might also like