Contoh Jurnal Evaluation Model
Contoh Jurnal Evaluation Model
net/publication/274393389
CITATION READS
1 2,613
1 author:
Abdalla M. El-Habil
Al-Azhar University - Gaza
32 PUBLICATIONS 226 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Abdalla M. El-Habil on 09 April 2017.
وآلة دعم،الشبكات العصبية تم عمل دراسة محاكاة وتبين منيا أن،عمى نماذج تنبؤية لمتصنيف
ت في ىذه الدراسة
وكذلك م، والجار األقرب أعطت نتائج عالية نوعا ما في عممية التصنيف،المتجو
ً فمنيا الكبيرة نسبيا،تطبيق ىذه المصنِّفات عمى ثالث مجموعات مختمفة من البيانات من حيث الحجم
أو، وكانت ىذه البيانات مختمفة أيضاً من حيث نوع المتغيرات المستقمة (كمية،ومنيا متوسطة الحجم
.) متعددة الفئات،( ثنائية الفئة
، وأيضاً من حيث عدد فئات المتغير التابع،) أو وصفية،ترتيبية
ولتقدير ِدقة التصنيف ليذه النماذج قد تم تطبيق ىذه المصنفات عمى طريقتان من طرق قياس الفعالية
وكانت المعيار الرئيسي في10-Fold Cross validation وHold out validation :وىي
تبين لنا من ىذه. الدقة الكمية في التصنيفِ تقييم ىذه المصنِّفات والنماذج التنبؤية هو التقييم من حيث
الدراسة وجود بعض االختالفات بين ىذه المصنِّفات في ِدقة تصنيفيا لممتغير التابع ثنائي الفئة في
نات و تصنيف المتغير التابع ذو الثالث فئات في مجموعة البيا،ًمجموعة البيانات الكبيرة نسبيا
وايضا في تصنيف المتغير التابع ذو السبع فئات في مجموعة البيانات المتوسطة،المتوسطة الحجم
لقياس10-Fold Cross validation وHold out validation الحجم عند استخدامنا كل من
كما تبين لنا أن نموذج آلة دعم المتجو ىو األفضل من بين ىذه المص نِّفات. فعالية ىذه المصنِّفات
حيث أظير أعمى ِدقة في تصنيف تمك المتغيرات باستخدام كال الطريقتان في قياس،المتنافسة
قد زادت من فعالية و10-Fold Cross validation الفعالية كما أظيرت ىذه الدراسة أن طريقة
.ِدقة ىذه المصنِّفات وقد كانت النتائج في حالتي دراسة المحاكاة والبيانات الحقيقية متقاربة نوعا ما
1. INTRODUCTION
Data mining is an extension of traditional statistical methods, allows
development of new techniques to deal with more complicated data type to
satisfy and match the needs for advanced data analyzing. Data mining
methods and algorithms serves statistics in several tasks, such as tasks of
classification, prediction, clustering, and etc. There are several data mining
techniques and predictive models are available for classification task, these
techniques are called classification models or classifiers. This study will
concentrates on identifying and evaluating of the five techniques commonly
used for classification: Decision tree, Neural networks, Support Vector
Machines (SVM), Naive Bayes, and k-Nearest Neighbor classifiers with the
famous methods of validation: Hold-out validation, 10-fold cross-validation,
and Bootstrapping. These classifiers will be validated and evaluated
according to their empirical performance through a comparative case study.
One of the main studies with similar approach that have been done is by
Aftarczuk (2007). It was shown that it is very difficult to name a single data
mining algorithm to be the most suitable for the medical data. Kiang (2003)
considered data mining classification techniques neural networks and
decision tree models and three statistical methods ( linear discriminate
analysis, logistic regression analysis and k-nearest-neighbor) to investigate
152
Evaluation of Data Mining Classification Models
153
Abdalla EL-HABIL And Mohammed El-Ghareeb
Decision trees classify instances by sorting them down the tree from the
root node to a leaf node, which provides the classification of these instances.
Each node in the tree specifies a test of some attribute of the instance, and
each branch descending from that node corresponds to one of the possible
values for this attribute. An instance is classified by starting at the root node
of the decision tree, testing the attribute specified by this node, then moving
down the tree branch corresponding to the value of the attribute. This
process is then repeated at the node on this branch and so on until a leaf
node is reached, which holds the class prediction for that instance. Decision
trees can easily be converted to classification rules (Mitchell, 1997).
b. Neural Networks
Neural networks, also called artificial neural network (ANNs), are one
of the most famous predictive models used for classification. An artificial
neural network is a system based on the operation of biological neural
networks, in other words, is an emulation of biological neural system.
Artificial neural networks born after McCulloc and Pitts introduced a set of
simplified neurons in 1943. These neurons were represented as models of
biological networks into conceptual components for circuits that could
perform computational tasks. Artificial neural network is developed with a
systematic step-by-step procedure which optimizes a criterion commonly
known as the learning rule. The input/output training data is fundamental for
these networks as it carries the information which is necessary to discover
the optimal operating point. There are various neural networks architectures,
the most successful applications in classification and prediction have been
multilayer feed forward networks. The layer where input patterns are
applied is the input layer; the layer from which an output response is desired
is the output layer. Layers between the input and output layers are known as
hidden or transfer layers, because their outputs are not readily observable.
c. Support Vector Machines
Support Vector Machines (SVM) is one of the most recent Data mining
techniques used for classification, developed by Cortes and Vapnik in 1995
for binary classification, (Cortes and Vapnik, 1995). SVM have been
developed in the framework of statistical learning theory (Vapnik, 1998),
and have been successfully applied to a number of applications, ranging
from time series prediction, to face recognition, to biological data
processing for medical diagnosis (Evgeniou, et al., 1999). SVM
classification finds the hyper -plane where the margin between the support
vectors is maximized. If all classifications contain two-class dependent
variables with two predictors, then the points of each class could be easily
154
Evaluation of Data Mining Classification Models
155
Abdalla EL-HABIL And Mohammed El-Ghareeb
b. Hold-Out Validation
The hold-out validation method, sometimes called test sample estimation. A
natural approach is to split the available data into two non-overlapped parts:
one for training and the other for testing. The test data is held out and not
looked at during training (Barnali and Mishra, 2011). Partitions the data into
two mutually exclusive subsets, it is common to designate 2/3 of the data as
the training set and the remaining 1/3 as the test set. (Kohavi, 1995). Hold-
out validation avoids the overlap between training data and test data,
yielding a more accurate estimate for the generalization performance of the
algorithm. The downside is that this procedure does not use all the available
data and the results are highly dependent on the choice for the training/test
split. The instances chosen for inclusion in the test set maybe too easy or too
difficult to classify (Zadeh et. al, 2009).
c. K-fold Cross-Validation
In k-fold cross-validation, the actual data is first partitioned into k equally
(or nearly equally) sized segments or folds. Subsequently k iterations of
training and validation are performed such that within each iteration a
different fold of the data is held-out for validation while the remaining k - 1
folds are used for learning. In data mining 10-fold cross-validation (k = 10)
is the most common method, and serves as a standard procedure for
performance estimation and model selection. (Kohavi,1995). It is the basic
form of cross validation, the idea of this method is to divide the data set into
a number of k folds and evaluate the errors. The most common choice for
evaluating a classification task is the accuracy (the percentage of correctly
classified cases). All other possible famous names of validation methods are
seem to be as special cases of k folds cross validation depending on the
choosing value of k. Leave-one-out cross-validation (LOOCV) is a special
case of k-fold cross-validation where k equals the number of instances in the
data. In other words in each iteration nearly all the data except for a single
observation are used for training and the model is tested on that single
observation [k = n-1]. An accuracy estimate obtained using LOOCV is
known to be almost unbiased but it has high variance, leading to unreliable
estimates (Efron, 1983). It is still widely used when the available data are
very rare, especially in bioinformatics where only dozens of data samples
are available. (Zadeh, et. Al, 2009).
d- Bootstrap
The bootstrap family was introduced by (Efron, 1983) and is fully described
in Efron & Tibshirani (1993). Given a dataset of size n, a bootstrap sample
is created by sampling n instances from the data (with replacement). Since
the dataset is sampled with replacement, the probability of any given
156
Evaluation of Data Mining Classification Models
instance not being chosen after n samples is (1-1/n)n ≈ e-1≈ 0.368; the
expected number of distinct instances from the original dataset appearing in
the test set is thus 0.632n. (Kohavi, 1995). We mentioned that there are
other methods also to be as special cases of k folds cross validation, like
Leave-one-out cross-validation (LOOCV), and Bootstrapping, which are
beyond of our analysis. So here we just gave the reader a brief idea of them.
2.3 Evaluation of a Classification Model
Classifiers and predictive models evaluation is one of the key points
in any data mining process. The main and frequently evaluation criteria
desired in classification perspective is the criteria of overall accuracy
obtained by model validation method. One of the famous methods of
visualizing the metric predictive performance can be obtained by
constructing a confusion matrix. Specific matrix layout that allows
visualization of the performance of an algorithm shows the predicted and
actual classifications. A confusion matrix is of size L x L, where L is the
number of classes. The matrix is a valuable tool because it not only shows
how frequently the model correctly predicted a value, but also shows which
other values the model most frequently predicted incorrectly.
General form of Confusion Matrix
Let Ni j be the number of elements in the population, of size N, which are
really of type j but are classified as being of type i. The matrix { Ni j } is
usually represented as:
True class, j
The diagonal elements of this matrix are the counts of the correct
classifications of each type. Several metric performance evaluations of a
classification model can be obtained by confusion matrix, the main and
importance one is the measure of the accuracy of the classification model
which is the proportion of correctly classified instance. In contrast the error
of classification resulted by the classifier or usually known as
misclassification error, can be obtained by confusion matrix, and could be as
other criteria of metric performance evaluations of a classification model.
157
Abdalla EL-HABIL And Mohammed El-Ghareeb
3. SIMULATION STUDY
We performed a simulation study to evaluate the five proposed data
mining classifiers described previously: Decision tree, neural networks,
support vector machine, naïve Bayes, and k-Nearest Neighbor.
We draw random samples from two different normal distributions
with different mean vectors, but equal covariance matrices, we used the
identity matrix as the covariance matrix. The two different normal
distributions were generated from N(1, 1) for the first group and N(0, 1) for
the second. The data sets are generated to observe the impact of changes
regarding the sample size, categorization and correlation matrices between
the predictors.
Sample size: The samples are simulated from normal distributions
with the same covariance matrix and different mean vectors, which are
divided equally into 2 classes. These simulations are based on an R function
(mvrnorm) for simulating from a multivariate normal distribution from R
package MASS. Five different sample sizes are generated 100, 200, 400,
and 500 to observe the impact of changes related to sample size.
Correlation: We used strong and weak correlation matrices in order
to evaluate the performance of the different classifiers in terms of the
presence of multicollinearity and to examine the effect of correlation
between explanatory variables. 2 simulated samples with two predictors
158
Evaluation of Data Mining Classification Models
have correlation coefficients of 0.25 and 0.90 were used for every simulated
data set this purpose.
Categorization: After sampling, the normally distributed variables
can be categorized, and divided into a certain number of categories of equal
size to assess each method in terms of number of categories. Four different
number of categories are considered (2, 3, 4 and 10).
All possible combinations between mentioned sample size,
correlation and the number of categories are considered. This process is
repeated 2000 times to achieve the convergence criterion. For each
simulation replication, 10-fold cross validation was performed for
evaluating the performance of each classification method.
The average of those 2000 correct classification rates is then
obtained to estimate the true classification rate of the different classifiers.
A part of simulations results are presented in the followings:
Tables 3.1, 3.2, and 3.3 show the measures of Data mining (DM) classifiers
overall accuracies versus the sample size, correlation, and categorization
Table 3.1: Overall Classification Accuracy versus the Sample Size, with
weak correlation, and no. of categories 3.
Sample Neural k-Nearest Naïve
DT SVM
size Networks Neighbor Bayes
100 0.820 0.845 0.868 0.820 0.840
200 0.840 0.850 0.874 0.830 0.855
400 0.865 0.865 0.880 0.855 0.865
500 0.875 0.884 0.885 0.860 0.870
Table 3.2: Overall Classification Accuracy versus the
Correlation, n=200, and no. of categories = 4
Neural k-Nearest Naïve
Correlation DT SVM
Networks Neighbor Bayes
0.90 0.785 0.850 0.820 0.790 0.820
0.25 0.840 0.880 0.890 0.830 0.865
Table 3.3: Overall Classification Accuracy versus the
Categorization, n= 300 and strong correlation
No. of Neural k-Nearest Naïve
DT SVM
Categories Networks Neighbor Bayes
2 0.840 0.885 0.878 0.830 0.850
3 0.835 0.855 0.860 0.825 0.855
4 0.805 0.840 0.850 0.835 0.825
10 0.740 0.755 0.760 0.750 0.730
159
Abdalla EL-HABIL And Mohammed El-Ghareeb
According to the simulation results for the effect of sample size shown in
table 3.1, it can be seen that the variation in sample size has similar effect on
almost all the methods and as the sample size increases the classification
accuracy increases.
When looking at the effect of the presence of multicollinearity on the
performance of a method, we can see from Table 3.2 that all the methods
have significant improvement in performance in the absence of
multicollinearity. The performances of the Neural Network, k-Nearest
Neighbor, and Support Vector Machines are superior compared with all
other methods for any correlation level.
When looking at the effect of the number of categories on the performance
of a method, it can be seen from Table 3.3 that as the number of categories
increases, the classification accuracy decreases for the performance of all
the methods, and the classification performance in case of a binary
categorical variable for each method is superior to its performance in case of
more than two classes categorical variable.
From the simulation results in the different situations (sample size,
categorization and correlation), it seems that Neural Networks and k-
Nearest Neighbor, gave the highest averaged rate of classification
accuracies.
4. NUMERICAL EXAMPLES
4.1 Data Description
We used three different datasets as shown in table 4.1, these datasets are
different among each other in their size, number and type of predictors, and
also different in the number of the classes of their dependent variable, they
have a binary and multi-class categorical dependent variable.
Table 4.1: A brief description of the properties of the datasets
Dataset Size No. of Type of No. of Classes
variables predictors
Diabetics 1566 11 Mixed 2
Iris 150 5 Numeric 3
Fish Species 159 6 Numeric 7
160
Evaluation of Data Mining Classification Models
161
Abdalla EL-HABIL And Mohammed El-Ghareeb
were measured. Three different length measurements are recorded: from the
nose of the fish to the beginning of its tail, from the nose to the notch of its
tail and from the nose to the end of its tail. The height and width are
calculated as percentages of the third length variable. This result in 6
observed variables, Weight, Length1, Length2, Length3, Height, and Width.
Observation 14 has a missing value in variable Weight, therefore this
observation is usually excluded from the analysis, The 7 species are
1=Bream, 2=Whitewish, 3=Roach, 4=Parkki, 5=Smelt, 6=Pike, 7=Perch. A
brief description of this data set is illustrated in the following table 4.4
Table 4.4: Fish dataset description
Description
Dataset Size 159 records Number of 7
1 is missing Attributes
Bream 35 Cases
Parkki 11 Cases
7-Class Perch 56 Cases
Dependent
Categorical Pike 17 Cases
Variable
Variable Roach 20 Cases
Smelt 14 Cases
Whitewish 6 Cases
No. of Predictors 6 variables- Numeric attributes
Here we are going to use this to classify the various seven species of fish
according to their body measures.
162
Evaluation of Data Mining Classification Models
A similar work has been done for a similar approach, but this time another
method of validation is used to measure classifiers accuracies. Table 4.6
shows the results of classification accuracies obtained by using hold out
validation method.
Table 4.6: Overall Classification Accuracy (using Hold-Out Validation)
DT Neural Networks SVM Naïve k-Nearest
Bayes Neighbor
Diabetics 0.787 0.767 0.787 0.737 0.772
Iris 0.96 0.98 0.98 0.96 0.98
Fish 0.827 0.865 0.865 0.692 0.728
5. Conclusion
After viewing the results and comparing them, we may conclude the
following:
There are slight differences between the classifier accuracies, validated by
using 10-fold cross validation method assigned to the Diabetics dataset.
However, we may consider SVM to be the most accurate classifier, since it
gave the highest rate among the competing classifiers. But here we can
mention that the accuracy for DT = .776 which is very close to the accuracy
for SVM = .778. The same thing happened for Iris dataset. Neural networks
and K- Nearest Neighbor have the most accurate classifier. In the case of
assigning these classifiers to Fish dataset, we can see some differences of
the rate of accuracies, and in this case SVM was the perfect classifier that
gave 100% overall accuracy. We may due this to the characteristics and
properties of datasets used. Where Fish dataset has 7 class categorical and
unbalanced dependent variable. Therefore, we may conclude that SVM and
Neural Networks are suitable classifiers to be assigned for such a case. For
the same approach with other scenario, where Hold-out validation method
used to measure the classifiers performance, both of Decision tree and SVM
gave the highest rate of classification accuracies when assigned to Diabetics
dataset. In the case of Iris dataset, all of neural networks, SVM, and k-
163
Abdalla EL-HABIL And Mohammed El-Ghareeb
REFERENCES
1. Sahu, B and Mishra, D. )2011), Performance of feed forward neural
network for a Novel Feature Selection Approach, IJCSIT, 2 (4), 1414-
1419.
2. Breiman L., Friedman, J. H., Olshen R. A. and Stone, C. J. )1984),
Classification and Regression Trees, Wadsworth.
3. Corinna, C. and Vladimir, V. )1995), Support-Vector Networks,
Machine Learning, 20.
4. Efron, B. (1983), Estimating the error rate of a prediction rule:
improvement on cross-validation, Journal of the American Statistical
Association, 316–331.
5. Efron, B., and Tibshirani, R. J. )1993), An introduction to the
bootstrap, Chapman & Hall, New York.
6. Jain, A. K., Dubes, R. C. and Chen, C. )1987), Bootstrap techniques
for error estimation, IEEE transactions on pattern analysis and machine
intelligence, Volume: PAMI-9, Issue:5, 628-633.
7. Aftarczuk, K. )2007), Evaluation of selected data mining algorithms
implemented in Medical Decision Support Systems, Master thesis.
8. Kiang, M. ) 2003), A comparative assessment of classification methods,
Decision Support Systems, 35, 441- 454.
9. Kohavi R. ) 1995), A study of cross-validation and bootstrap for
accuracy estimation and model selection, In Proceedings of
International Joint Conference on AI.
10. Larson S. ) 1931), The shrinkage of the coefficient of multiple
correlation, Educat. Psychol, 22:45–55
11. Lim, N. )2007), Classification by Ensembles from Random Partitions
using Logistic Regression Models, Ph.D. thesis.
164
Evaluation of Data Mining Classification Models
Web References
Web1: Journal of Statistical Education, Fish Catch Data Set,
[http://www.amstat.org/publications/jse/datasets/fishcatch.txt]
accessed August, 2006.
165