A Novel Template Reduction Approach For The - Nearest Neighbor Method

it's a document based in the algorithm knn

Uploaded by

David Bastidas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views7 pages

A Novel Template Reduction Approach For The - Nearest Neighbor Method

it's a document based in the algorithm knn

Uploaded by

David Bastidas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

890 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO.

5, MAY 2009

[9] H. Lin and L. Li, Large-margin thresholded ensembles for ordinal re- A Novel Template Reduction Approach for
gression: Theory and practice, in Proc. 17th Int. Conf. Algorithmic the -Nearest Neighbor Method
Learn. Theory, 2006, pp. 319333.
[10] S. Kramer, G. Widmer, B. Pfahringer, and M. DeGroeve, Prediction
Hatem A. Fayed and Amir F. Atiya
of ordinal classes using regression trees, Fundamenta Informaticae,
vol. 47, pp. 113, 2001.
[11] S. Har-Peled, D. Roth, and D. Zimak, Constraint classification: A new
approach to multiclass classification and ranking, Neural Inf. Process. AbstractThe K -nearest neighbor (KNN) rule is one of the most widely
Syst., vol. 15, pp. 785792, 2002. used pattern classification algorithms. For large data sets, the computa-
[12] R. Herbrich, T. Graepel, and K. Obermayer, Lrage margin rank tional demands for classifying patterns using KNN can be prohibitive. A
boundaries for ordinal regression, in Advances in Large Margin way to alleviate this problem is through the condensing approach. This
Classifiers. Cambridge, MA: MIT Press, 2000, pp. 115132. means we remove patterns that are more of a computational burden but
[13] P. McCullagh and J. A. Nelder, Generalized Linear Models. London, do not contribute to better classification accuracy. In this brief, we propose
U.K.: Chapman & Hall, 1983. a new condensing algorithm. The proposed idea is based on defining the
[14] P. McCullagh, Regression models for ordinal data, J. Roy. Statist. so-called chain. This is a sequence of nearest neighbors from alternating
Soc. B, vol. 42, no. 2, pp. 109142, 1980. classes. We make the point that patterns further down the chain are close
[15] V. E. Johnson and J. H. Albert, Ordinal Data Modeling (Statistics for to the classification boundary and based on that we set a cutoff for the pat-
Social Science and Public Policy). New York: Springer-Verlag, 1999. terns we keep in the training set. Experiments show that the proposed ap-
[16] A. Shashua and A. Levin, Ranking with large margin principle: Two proach effectively reduces the number of prototypes while maintaining the
approaches, Neural Inf. Process. Syst., vol. 15, pp. 937944, 2002. same level of classification accuracy as the traditional KNN. Moreover, it is
[17] H. Yu, J. Yang, and J. Han, Classifying large data sets using SVMs a simple and a fast condensing algorithm.
with hierarchical clusters, in Proc. 9th ACM SIGKDD Int. Conf.
Knowl. Disc. Data Mining, 2003, pp. 306315. Index TermsCondensing, cross validation, editing, K -nearest neighbor
[18] D. Boley and D. Cao, Training support vector machine using adap- (KNN), template reduction.
tive clustering, in Proc. 4th SIAM Int. Conf. Data Mining, 2004, pp.
126137.
[19] J. Wang, X. Wu, and C. Zhang, Support vector machines based on I. INTRODUCTION
k -means clustering for real-time business intelligence systems, Int. J.
Business Intell. Data Mining, vol. 1, no. 1, pp. 5464, 2005. K
The -nearest neighbor (KNN) classification rule is one of the
[20] J. Yuan, J. Li, and B. Zhang, Learning concepts from large scale im- most well-known and widely used nonparametric pattern classification
balanced data sets using support cluster machines, in Proc. 14th Annu. methods. Its simplicity and effectiveness have led it to be widely used
ACM Int. Conf. Multimedia, 2006, pp. 441450. in a large number of classification problems, including handwritten
[21] M. Almeida, A. Braga, and J. Braga, SVM-KM: Speeding SVMs
learning with a priori cluster selection and k -means, in Proc. 6th digits, satellite image scenes, and medical diagnosis [1][5]. For KNN,
Brazilian Symp. Neural Netw., 2000, pp. 162167. however, two major outstanding problems are yet to be resolved by
[22] H. Yu, J. Yang, J. Han, and X. Li, Making svms scalable to large data the research community. The first issue is the selection of the best K
sets using hierarchical cluster indexing, Data Mining Knowl. Disc., (number of neighbors to consider), as this problem is greatly affected
vol. 11, no. 3, pp. 295321, 2005.
[23] Z. Xu, K. Yu, V. Tresp, X. Xu, and J. Wang, Representative sampling
by the finite sample nature of the problem. The second issue is the
for text classification using support vector machines, in Proc. 25th computational and the storage issue. The traditional KNN rule requires
Eur. Conf. Inf. Retrieval Res., 2003, pp. 393407. the storage of the whole training set which may be an excessive amount
[24] K. Zhang and J. T. Kwok, Block-quantized kernel matrix for fast spec- of storage for large data sets and leads to a large computation time
tral embedding, in Proc. 23rd Int. Conf. Mach. Learn., 2006, vol. 23, in the classification stage. There are two well-known procedures for
pp. 10971104.
[25] R. Bellman, Introduction to Matrix Analysis, 2nd ed. Philadelphia, reducing the number of prototypes (sometimes referred to as template
PA: SIAM, 1997. reduction techniques). The first approach, called editing, processes
[26] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Sil- the training set with the aim of increasing generalization capabilities.
verman, and A. Y. Wu, An efficient k -means clustering algorithm: This is accomplished by removing prototypes that contribute to the
Analysis and implementation, IEEE Trans. Pattern Anal. Mach. In-
tell., vol. 24, no. 7, pp. 881892, Jul. 2002.
misclassification rate, for example, removing outlier patterns or
[27] I. S. Dhillon, Y. Guan, and B. Kulis, Weighted graph cuts without removing patterns that are surrounded mostly by others of different
eigenvectors: A multilevel approach, IEEE Trans. Pattern Anal. Mach. classes [6][8]. The second approach is called condensing. The aim
Intell., vol. 29, no. 11, pp. 19441957, Nov. 2007. of this approach is to obtain a small template that is a subset of the
[28] W. Waegeman, B. Baets, and L. Boullart, Roc analysis in ordinal re- training set without changing the nearest neighbor decision boundary
gression learning, Pattern Recognit. Lett., vol. 29, pp. 19, 2008.
[29] J. Rennie and N. Srebro, Loss functions for preference levels: Regres- substantially. The idea is that the patterns near the decision boundary
sion with discrete, ordered labels, in Proc. IJCAI Multidisciplinary are crucial to the KNN decision, but those far from the boundary
Workshop Adv. Preference Handling, 2005, pp. 180186. do not affect the decision. Therefore, a systematic removal of these
ineffective patterns helps to reduce the computation time. This can be
established by reducing the number of prototypes that are centered in
dense areas of the same class [9][20]. In this brief, we consider only

Manuscript received August 24, 2007; revised December 21, 2008; accepted
March 11, 2009. First published April 21, 2009; current version published May
01, 2009.
H. A. Fayed is with the Department of Engineering Mathematics and Physics,
Cairo University, Cairo 12613, Egypt (e-mail: [email protected]).
A. F. Atiya is with the Department of Computer Engineering, Cairo Univer-
sity, Cairo 12613, Egypt (e-mail: [email protected]).
Color versions of one or more of the figures in this brief are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNN.2009.2018547

1045-9227/$25.00 2009 IEEE

Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009 891

the condensing approach. Below is a short summary of some existing [19], and 6) incorporation of both the proximity between patterns and
algorithms for the condensing approach. geometrical distribution around the given pattern [20]. There are other
In 1968, Hart [9] was the first to propose an algorithm for reducing methods that combine both editing and condensing techniques forming
the size of the stored data for the nearest neighbor decision (the algo- a hybrid model [21].
rithm is called CNN). Hart defined a consistent subset of the data as one In this brief, we introduce a new condensing algorithm, namely, the
that classifies the remaining data correctly with the nearest neighbor template reduction for KNN (TRKNN). The basic idea is to define a
rule. He built this consistent set by sequentially adding to it data points chain of nearest neighbors. By setting a cutoff value for the distances
from the training set as long as the added data point is misclassified among the chain, we effectively separate the patterns into the selected
(using the 1-NN rule). By construction, the resulting reduced subset condensed set (probably consisting of patterns near the decision
classifies all the training data correctly. Empirical results have shown boundary) and the removed set (probably interior patterns). The paper
that Harts CNN rule considerably reduces the size of the training set is organized as follows. The proposed TRKNN method is described
at the expense of minimal or even no degradation in classification per- in Section II. Some analytical insights are introduced in Section III.
formance. The drawback of CNN is that frequently it may keep some Then the proposed method is validated experimentally in Section IV.
points that are far from the decision boundary. To combat this, in 1972, Results are discussed in Section V. Finally, conclusions are drawn in
Gates [10] proposed what he called the reduced nearest neighbor rule Section VI.
(RNN). This method is based on first applying CNN and then per-
forming a postprocessing step. In this postprocessing step, the data II. TEMPLATE REDUCTION FOR KNN
points in the consistent set are revisited and removed if their deletion The goal of the proposed approach is to discard prototypes that are
does not result in misclassifying any point in the training set. Exper- far from the boundaries and have little influence on the KNN clas-
imental results confirmed that RNN yields a slightly smaller training sification. To establish this, we first introduce the so-called nearest
subset than that obtained with CNN. In [11], Bhattacharya et al. (1992) neighbor chain. This is simply a sequence of the nearest neighbors
proposed two methods, one based on the Voronoi graph and the other from alternating classes. Consider first the following definitions. Con-
based on the Gabriel graph. The methods have the merit that they are sider a pattern xi (also call it xi0 and let it be from class !m ). Let
exact and yield sets independent of the order in which the data are pro- xi1 NN(xi0 ) denote the nearest neighbor to xi0 that is from a dif-
cessed. The method based on a Voronoi graph yields a condensed set ferent class. Similarly, let xi2 NN(xi1 ) denote the nearest neighbor
which is both training-set consistent (i.e., it classifies all the training to xi1 that is from the starting class (!m ). We continue in this manner,
data correctly) and decision-boundary consistent (i.e., it determines ex- with xi;j +1 NN(xij ). This sequence of xij s (whose class mem-
actly the same decision boundary as that of the entire training set). berships alternates between class !m and the other classes) constitutes
However, it suffers from a large complexity due to the need to con- the nearest neighbor chain. Below is the precise definition.
struct the Voronoi diagram. On the other hand, the method based on Definition: A nearest neighbor chain Ci of a pattern xi (of Class
the Gabriel diagram is faster but it is neither decision-boundary con- !m ) is defined as the sequence xi0 ; xi1 ; xi2 ; . . . ; xik and the sequence
sistent nor training-set consistent. In [12], Wilson and Martinez (2000) di0 ; di1 ; . . . ; di;k01 where the root pattern xi0 = xi , and xij is the
closest pattern to xi;j 01 (of a class different from !m if j is odd and
of Class !m if j is even). Moreover, dij = kxi;j +1 0 xij jj2 is the Eu-
presented five algorithms for reducing the size of case bases: DROP1,
DROP2, . . ., DROP5. Decremental reduction optimization procedure 1
(DROP1) is the basic removal scheme based on so-called associate pat- clidean distance between patterns xi;j +1 and xij . The chain is stopped
terns. The associate patterns for some pattern p are the patterns which at xik if xi;k+1 = xi;k01 . Note that the distance sequence is a nonin-
creasing sequence (i.e., dij di;j +1 ).
have p as one of their K -nearest neighbors. The removal of p is deter-
Fig. 1 shows some examples of constructed chains for a two-class
mined based on its effect on the classification of its associates. DROP2
problem. In summary, a chain is constructed as follows. Start from
is a modification whereby the order of the patterns to be removed is
a pattern xi . Find the nearest neighbor from a different class. Then,
selected according to a certain distance criterion in a way to remove
from that pattern find the nearest neighbor from the starting class. Con-
patterns furthest from the decision boundary first. DROP2 also dif-
tinue in this manner until we end up with two patterns that are nearest
fers from DROP1 in that deletion decisions still rely on the original
neighbor to each other. Note that by construction the distances between
set of associates. DROP3, DROP4, and DROP5 are versions whereby
the patterns in the chain form a nonincreasing sequence. Note also that
noise-filtering pass is performed prior to applying the DROP2 proce-
patterns downstream the chain will probably be close to the classifica-
dure. In [13], Mollineda et al. (2002) obtained a condensed 1-NN clas- tion boundary, because they will have smaller distances from the pat-
sifier by merging the same class nearest clusters as long as the set of terns of different classes. This provides the basis of the proposed con-
new representatives correctly classify all the original patterns. In [14], densing procedure.
Wu et al. (2002) proposed an efficient method to reduce the training set The basic idea of the proposed condensing approach is as follows.
required for KNN while maintaining the same level of classification ac- For each pattern xi in the training set, we construct its corresponding
curacy, namely, the improved KNN (IKNN). This is implemented by chain Ci . The pattern xij in the chain is dropped (from the selected
iteratively eliminating patterns, which exhibit high attractive capacities condensed set) if dij > 1 di;j +1 where is a threshold >1, and
(the attractive capacity sy of a pattern y is defined as the number of pat- j = 0; 2; 4; . . . up to the size of the chain. Note that we allow only
terns from class C (y ), which are closer to pattern y than other patterns patterns from the same class as that of xi to be eliminated (i.e., we
belonging to other classes). The algorithm filters out a large portion consider only the even patterns in the chain). This is important when
of prototypes that are unlikely to match against the unknown pattern. dealing with a multiclass problem as the chain is constructed using the
This accelerates the classification procedure considerably, especially in one-against-all concept as has been illustrated earlier. Typically, when
cases where the dimensionality of the feature space is high. Other ap- starting the chain with an interior point, the distance to the next point
proaches for condensing are based on: 1) evolutionary algorithms and in the chain will be large. As the chain converges onto the boundary
decision trees [15], 2) space partitioning [16], 3) decision boundary points, the distances decrease in value and will more or less level off.
preservation [17], 4) estimation of the distribution of representatives This gives a rationale for the proposed cutoff procedure. Because there
according to the information they convey [18], 5) a gradient-descent is a significant decrease in distances, the considered pattern is deemed
technique for learning prototype positions and local metric weights to be probably an interior point and can be discarded, whereas if the

Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.
892 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009

Fig. 1. Illustrative example of the chains.

distances do not decrease too much, then we are probably oscillating TABLE I
around the classification boundary, and the pattern is kept. Below are SUMMARY OF DATA SETS
the precise steps of the condensing method.

Algorithm: TRKNN

Inputs:
Training set
.
Distance ratio threshold .
Output:
Reduced training set
.
shortcut. Note also that the condensed set does not depend on some pat-
Method: tern presentation order, like many condensing algorithms. The reason is
that at each step the full training set is considered as a whole. Another
For each pattern xi in
comment that applies also to most condensing methods is that after
Find its corresponding chain Ci condensing it is imperative to have the number of neighbors K (for the
KNN classification) a little reduced than K used with the whole data
End For set. This is due to the somewhat redundant patterns being removed.
For each chain Ci This reduction will be bigger if more data is removed.

For j = 0; 2; 4; . . . up to the size of C i III. ANALYTICAL INSIGHTS

If dij > 1 di;j +1 then mark the pattern xij The proposed algorithm relies on the concept that a pattern near the
classification boundary will tend to have a nearest neighbor (from an-
End For other class) that is fairly close. On the other hand for an interior pattern
that distance will tend to be larger. Based on this concept, we discard
End For
the latter patterns. While this concept is fairly intuitive, it would be
Drop all marked patterns from
. beneficial to provide some analytical analysis in order to gain insight
and to understand the degree and the factors affecting that relationship.
Below we develop some approximate analysis.
Fig. 1 shows an example that illustrates the working of the algorithm. For simplicity, consider a two-class case. Consider a pattern x0 from
The closest pattern of different class to pattern x1 is x11 and the dis- class !1 , and let the dimension of the pattern vector be D . Moreover,
tance between them is d11 . Similarly, the closest pattern of different let p(xj!2 ) denote the class conditional density for class !2 , and let
class to pattern x11 is x12 and the distance between them is d12 . Now there be N training patterns from class !2 . Finally, let NN(x0 ) denote
x1 is dropped if d11 > 1 d12 . the nearest neighbor to pattern x0 that is from class !2 . Let r be the
Note that some computational savings can be achieved by observing distance from x0 to NN(x0 ). Define the following random variable:
that some chains encompass some other smaller chains. By operating
on the larger chains first, we automatically get the condensing results
of the smaller chains contained in them, leading to some computational
= p(xj!2 )dx:
kx0x k r

Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009 893

TABLE II
SUMMARY OF AVERAGE TRAINING TIME AND TEST TIME IN SECONDS

TABLE III
MEAN NUMBER OF PROTOTYPES (NPROT) AND MEAN TEST ERROR RATES (ERR) OVER THE DIFFERENT FOLDS.
STANDARD DEVIATIONS ARE SHOWN IN BRACKETS

It is well known (see [22] and [23]) that such a variable obeys the fol- to have a relative cutoff point. That is, we discard patterns based on
lowing beta density function: comparing them according to the successive distances in the chain (i.e.,
when dij > 1 di;j +1 ).
p( ) = N (1 0 )N01 ; 0 1:
Assume that the number of training patterns from class !2 is suffi- IV. EXPERIMENTAL RESULTS
ciently large such that the NN(x0 ) is close to x0 . Hence, within the To validate the proposed algorithm, we compared it with the tradi-
ball centered around x0 stretching out to NN(x0 ), the density p(xj!2 ) tional KNN and some of the condensing methods: the DROP2 [12] and
can be approximated as almost constant, and hence the IKNN [14] for several real-world data sets. Note that, as we focus on

c 1 rD p(x0 j!2 )
comparing condensing methods, we do not employ DROP3, DROP4,
or DROP5 [12], as these just add preprocessing steps that can be applied
where the term c 1 rD represents the volume of a D -dimensional ball
to any method. On the other hand, DROP1s accuracy is significantly
of radius r , with c D=2 =0(1 + D=2). Then, we get
low compared to KNN and DROP2. Therefore, to attain a fair compar-
ison, DROP2 is included in the comparison. We use the fivefold vali-
1=D

dation procedure for the purpose of tuning the key parameters of each
r :
c 1 p(x0 j!2 ) method. In this approach, the training set is partitioned into five equal
parts. Training is performed on four parts and validated on the fifth part.
The expectation of r is then given by Then the validation part is rotated and training is performed again. The
1 1=D process is repeated five times, and the validation classification error
N (1 0 )N01

E (r) = d
c 1 p(x0 j!2 )
on all five parts is combined. The parameter values that yield the best
0 validation results will then be chosen for testing the performance. The
which can be evaluated as tuned parameters are as follows. In all methods, suggested values for K
1=D are odd values from 1 to 9. For IKNN, suggested values for the attrac-
p(x0 j!2 )01=D :
0(N + 1)0(1 + 1=D )(0(1 + D=2))
E (r) = p0(N + 1 + 1=D) tive capacity threshold (S ) are: [0:01; 0:05; 0:1]2Nmin , where Nmin is
the minimum number of patterns corresponding to the same class while
The previous relation confirms the fact that the distance to the nearest suggested values for the portion function are: (t) = 2 (t + 1)00:5 ,
neighbor from the other class is small if we are close to the classifica- where 2 f0:1; 0:2g (for more details and description of the param-
tion boundary (where the opposing class-conditional density p(xj!2 ) eters, see [14]). For TRKNN, suggested values for are 1.2, 1.4, and
would be high). Conversely, that distance would be large if p(xj!2 ) 1.6. The distance metric used is the Euclidean distance for all methods.
was small (signifying that we are in the interior and far away from the Concerning the DROP2 method, there are no tunable parameters. There
boundary). Moreover, that monotone relationship decays more slowly are four main performance measures. The training time represents the
for large dimensional problems. One might contemplate the situation time it takes to condense the training set, including searching for the
when we are near the boundary but p(xj!2 ) is still small. This situ- optimal parameters, such as K and the others. (However, it is com-
ation arises when the other class conditional density p(xj!1 ) is also puted as the average time per tested parameter set. This way we avoid
small, that is, we operate in a relatively sparse area in the space. To penalizing methods that have a larger number of parameters, such as
compensate for that, the algorithm uses the other distances in the chain the competing IKNN, or have finer parameter grid.) The testing time

Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.
894 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009

TABLE IV
COMBINED 5 2 2 CROSS-VALIDATION F TEST FOR THE TEST CLASSIFICATION ERROR (ERR) AND THE NUMBER OF PROTOTYPES (NPROT).
REJECTION DECISION OF THE NULL HYPOTHESIS IS SHOWN IN BRACKETS

Fig. 2. Test classification error versus the number of prototypes over the 5 22 Fig. 3. Test classification error versus the number of prototypes over the 5 22
folds for the Pima Indians data set.
folds for the Breast Cancer data set.

the average on replication i be p = (pi + pi )=2 and let the es-

(1) (2)

timated variance be si2 = (pi 0 p)2 + (pi 0 p)2 . The combined

(1) (2)
represents the classification time using the condensed set. It is very
closely tied to the third measure, which is the number of patterns in 5 2 2 cross-validation F test is applied by assuming that the statistic
the condensed set. The test classification error is the final performance f = 5i=1 2j =1 (p(ij ) )2 =2 5i=1 si2 has approximately an F distri-
measure. The main goal of any condensing method is to reduce the bution with ten and five degrees of freedom. The null hypothesis is
number of patterns as much as possible, with as little sacrifice as pos- rejected with 95% confidence if f is greater than 4.74.
sible to the classification accuracy. We used five real-world data sets. The first data set was obtained
In comparing the classification error and the number of prototypes from cancer1.dt file from Proben1 database [26], which was created
of any two methods a challenging issue is to test whether the difference based on the Breast Cancer Wisconsin problem data set from the
is statistically significant. We have used the combined 5 2 2 cross-val- University of California at Irvine (UCI) Machine Learning Repository
idation F test [24] (a study in [25] shows the superiority of this tests database [27]. The second data set was obtained from diabetes1.dt file
compared to alternative ones). To apply this test, five replications of from Proben1 database [26], which was created based on the Pima
twofold cross validation are performed. In each replication, the data Indians Diabetes problem data set from the UCI Machine Learning
set is divided into two equal-sized sets, one for training and the other Repository database [27]. The remaining data sets were obtained from
for testing. Let pi be the difference between the error rates of the
(j )
the UCI repository [27]. Summary of the data sets is shown in Table I.
two classifiers on fold j = 1; 2 for replication i = 1; . . . ; 5. Let It shows the names and the details of the data sets, such as the number

Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009 895

Fig. 4. Test classification error versus the number of prototypes over the 5 22 Fig. 5. Test classification error versus the number of prototypes over the 5 22
folds for the Balance Scale data set. folds for the Landsat data set.

of patterns, the number of features, and the number of classes. We per-

formed the implementation using MATLAB 7 on Windows XP with
SP2 running on Pentium IV 2.4-GHz PC with 768-MB RAM.
V. RESULTS
The results of the average training time over all folds and the test
time (in seconds) are shown in Table II while the number of prototypes
and the test classification error rates are shown in Table III. The results
of the 5 2 2 significance test applied to the test classification error and
to the number of prototypes are shown in Table IV for the different
data sets. The table shows that for all data sets the F test at the 95%
confidence level does not reject the hypothesis that IKNN and TRKNN
give a similar test error, and also does not reject the hypothesis that
KNN and TRKNN give a similar test error. On the other hand, TRKNN
gives smaller number of prototypes (i.e., higher reduction rate) than
IKNN for two of the data sets (Pima Indians and Balance Scale), also at
the 95% level. For the three other data sets, TRKNNs outperformance
is not significant at the 95% level. Compared to DROP2, we observe
the following. For the Pima Indians data set, DROP2 gets significantly
lower number of prototypes. But this is at the expense of significantly
worse test error (as if DROP2 dropped too many patterns to an extent Fig. 6. Test classification error versus the number of prototypes over the 5 22
folds for the Pendigits data set.
that it affected classification performance). On the other hand, for the
Pendigits data set, TRKNN produces a significantly lower number of
prototypes, while test error is comparable.
We note that there is generally a tradeoff between test error and On the other hand, for three of the problems (Breast Cancer, Pima In-
number of prototypes (NPROT) selected (or reduction ratio). To clarify dians, and Balance Scale) the number of prototypes for TRKNN is
that the proposed TRKNN is winning in the overall NPROT/test error significantly lower than that of IKNN. For one problem (Pendigits),
tradeoff, we performed the following analysis. It is based on the per- TRKNN beats in the NPROT aspect (i.e., gives lower NPROT), but by
formed 5 2 2 test. As mentioned, we perform the test ten times (on the a small amount. For the remaining problem (Landsat), both methods
two folds times the five partitions). Consider a single test (i.e., on one are about equal. When we say significantly beats it is based on the
specific fold and one specific partition) and record NPROT and the test fact that the averages are different and the standard deviations do not
error. Repeat ten times for the ten tests and get ten pairs of NPROT/test lead to overlapping of the points, which can be seen visually in the plot.
error numbers. Plot these pairs as points in the 2-D space with the axes For some of the problems (such as Balance Scale), IKNN gives lower
being NPROT and the test error. We will have ten points for TRKNN, average test error, but there is a large overlap of the points (in the test
corresponding to the NPROT/test error outcomes of the ten test sets, error dimension), and that makes the difference not statistically signifi-
ten other points for IKNN, corresponding also to the test outcomes for cant. Concerning DROP2 versus TRKNN, as observed before, one can
IKNN, and ten other points for DROP2. Figs. 26 show the plots for see that DROP2 obtains lower NPROT at the expense of a worse test
each of the tested UCI data sets. One can see in the plots that for all error for Pima Indians data set. On the other hand, for both Landsat and
problems the test errors for both TRKNN and IKNN are comparable. Pendigits, DROP2 produces significantly worse (i.e., higher) NPROT.

Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.
896 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009

This happens while the test errors for DROP2 and TRKNN are compa- [6] D. L. Wilson, Asymptotic properties of nearest neighbor rules using
rable. One can see the clear separation in the NPROT dimension, while edited data, IEEE Trans. Syst. Man Cybern., vol. SMC-2, no. 3, pp.
in the test error dimension the data overlap. It seems that possibly for 408420, Jul. 1972.
[7] P. A. Devijver and J. Kittler, On the edited nearest neighbor rule, in
larger data sets DROP2 does not prune out enough points (which para- Proc. 5th Int. Conf. Pattern Recognit., Miami, FL, 1980, pp. 7280.
doxically are the type of problems where we need to drop points the [8] F. J. Ferri and E. Vidal, Colour image segmentation and labeling
most). through multiedit-condensing, Pattern Recognit. Lett., vol. 13, pp.
Concerning the training time (Table II), the time of TRKNN is con- 561568, 1992.
[9] P. E. Hart, The condensed nearest neighbor rule, IEEE Trans. Inf.
siderably shorter than that of IKNN, by a factor of around 2 or 3 times. Theory, vol. IT-14, no. 3, pp. 515516, May 1968.
This is because the training time of TRKNN is dominated by the com- [10] W. Gates, The reduced nearest neighbor rule, IEEE Trans. Inf.
putation of the distance matrix (the matrix that holds distances between Theory, vol. IT-18, no. 3, pp. 431433, May 1972.
pairs of all training patterns) whose complexity is O(n2 ), where n is [11] B. K. Bhattacharya, R. S. Poulsen, and G. T. Toussaint, Application of
the training set size. This distance matrix is computed only once at proximity graphs to editing nearest neighbor decision rules, in Proc.
16th Symp. Interface Between Comput. Sci. Statist., 1984, pp. 97108.
the beginning of the training process. For IKNN, besides the compu- [12] D. R. Wilson and T. R. Martinez, Reduction techniques for instance-
tation of the distance matrix, there is an extra computation at each it- based learning algorithms, Mach. Learn., vol. 38, no. 3, pp. 257286,
eration for the evaluation and sorting of the attractive capacities with 2000.
complexity O(nA log A), where A is the average attractive capacity. [13] R. A. Mollineda, F. J. Ferri, and E. Vidal, An efficient prototype
For large data sets, this attractive capacity A could be rather large. This
merging strategy for the condensed 1-NN rule through class-con-

computation is repeated for a number of iterations (say J ), leading to

ditional hierarchical clustering, Pattern Recognit., vol. 35, pp.
27712782, 2002.
the extra complexity of O(nJA log A) (beyond that of TRKNN). Sim- [14] Y. Wu, K. Ianakiev, and V. Govindaraju, Improved k-nearest neighbor
ilarly, TRKNN is faster (in training speed) than DROP2, by a factor of classification, Pattern Recognit., vol. 35, pp. 23112318, 2002.
[15] J. R. Cano, F. Herrera, and M. Lozano, Using evolutionary algorithms
around 3 or 4. The reason for the slow training speed for DROP2 is
as instance selection for data reduction in KDD: An experimental
the need to sort the distances that are computed from each pattern to study, IEEE Trans. Evol. Comput., vol. 7, no. 6, pp. 561575, Dec.
its enemy pattern. The enemy pattern is defined as the closest pattern 2003.
from a different class. [16] J. S. Snchez, High training set size reduction by space partitioning
Overall, viewing all performance criteria, such as the test and prototype abstraction, Pattern Recognit., vol. 37, no. 7, pp.
15611564, 2004.
error/number of prototype tradeoff, and the speed, we feel that [17] R. Barandela, F. J. Ferri, and J. S. Snchez, Decision boundary pre-
TRKNN has an edge over the competing IKNN and DROP2 methods. serving prototype selection for nearest neighbor classification, Int. J.
Pattern Recognit. Artif. Intell., vol. 19, no. 6, pp. 787806, 2005.
[18] D. Huang and T. W. S. Chow, Enhancing density-based data reduction
VI. CONCLUSION using entropy, Neural Comput., vol. 18, no. 2, pp. 470495, 2006.
[19] R. Paredes and E. Vidal, Learning prototypes and distances: A pro-
In this brief, a new condensing method for KNN is proposed. The totype reduction technique based on nearest neighbor error minimiza-
tion, Pattern Recognit., vol. 39, no. 2, pp. 171179, 2006.
method drops patterns that are far away from the boundary and thus
[20] J. S. Snchez and A. I. Marqus, An LVQ-based adaptive algorithm
have little influence on the KNN classification. Experiments show that for learning from very small codebooks, Neurocomputing, vol. 69, no.
the proposed approach reduces the template set size without sacrificing 79, pp. 922927, 2006.
the accuracy compared to the traditional KNN and two recent con- [21] H. Brighton and C. Mellish, Advances in instance selection for in-
densing methods. In addition, this method can be considered simple stance-based learning algorithms, Data Mining Knowl. Disc., vol. 6,
pp. 153172, 2002.
in implementation, and is computationally fast. [22] D. A. Fraser, Nonparametric Methods in Statistics. New York: Wiley,
1957, ch. 4.
[23] R. D. Short and K. Fukunaga, The optimal distance measure for
REFERENCES nearest neighbor classification, IEEE Trans. Inf. Theory, vol. IT-27,
no. 5, pp. 622627, Sep. 1981.
[1] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed. 2
[24] E. Alpaydin, Combined 5 2 cv F test for comparing supervised
New York: Academic, 1990. classification learning algorithms, Neural Comput., vol. 11, pp.
[2] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd 18851892, 1999.
ed. New York: Wiley, 2001. [25] T. G. Dietterch, Approximate statistical tests for comparing super-
[3] T. Hastie, R. Tibshirani, and J. H. Friedman, The Elements of Sta- vised classification learning algorithms, Neural Comput., vol. 10, pp.
tistical Learning: Data Mining Inference, and Prediction, ser. Statis- 18951923, 1998.
tics. Berlin, Germany: Springer-Verlag, 2001. [26] L. Prechelt, Proben1, A set of neural-network benchmark problems,
[4] A. R. Webb, Statistical Pattern Recognition, 2nd ed. London, U.K.: University of Karlsruhe, Germany, 1994 [Online]. Available: Avail-
Wiley, 2002. able: http://page.mi.fu-berlin.de/prechelt/Biblio/1994-21.pdf
[5] W. Duch, Similarity based methods: A general framework for classi- [27] C. L. Blake and C. J. Merz, UCI Repository of Machine Learning
fication, approximation and association, Control Cybern., vol. 29, no. database, Dept. Inf. Comput. Sci., Univ. California, Irvine, Irvine, CA,
4, pp. 937968, 2000. 1998 [Online]. Available: http://www.ics.uci.edu/~mlearn

Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.

K-NN Algorithm and Clustering Analysis
No ratings yet
K-NN Algorithm and Clustering Analysis
93 pages
Chapter 3
No ratings yet
Chapter 3
33 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
105 pages
ML Unit-4
No ratings yet
ML Unit-4
23 pages
UNIT 4 ML Notes
No ratings yet
UNIT 4 ML Notes
22 pages
K-Means Clustering and K-Nearest Neighbors Algorithm
No ratings yet
K-Means Clustering and K-Nearest Neighbors Algorithm
62 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
Unit 4
No ratings yet
Unit 4
24 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
new90程梅洁电子商务 202111080313
No ratings yet
new90程梅洁电子商务 202111080313
12 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
FPA Unit 2
No ratings yet
FPA Unit 2
20 pages
Wilson Icml97 Prune
No ratings yet
Wilson Icml97 Prune
9 pages
K Nearest Neighbor Classification
No ratings yet
K Nearest Neighbor Classification
30 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
K-Nearest Neighbor On Python Ken Ocuma
100% (2)
K-Nearest Neighbor On Python Ken Ocuma
9 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
IMAJOR DS Vol5 4
No ratings yet
IMAJOR DS Vol5 4
14 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
Entropy (S) Log (P) : I 1c I I
No ratings yet
Entropy (S) Log (P) : I 1c I I
5 pages
Challenges in KNN Classification: Shichao Zhang
No ratings yet
Challenges in KNN Classification: Shichao Zhang
13 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
DM - MP
No ratings yet
DM - MP
15 pages
Classification Algorithms I
No ratings yet
Classification Algorithms I
14 pages
Improved K-Nearest Neighbor Classi"cation: Yingquan Wu, Krassimir Ianakiev, Venu Govindaraju
No ratings yet
Improved K-Nearest Neighbor Classi"cation: Yingquan Wu, Krassimir Ianakiev, Venu Govindaraju
8 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
AI Unit 5 Part1
No ratings yet
AI Unit 5 Part1
6 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
Research and Implementation of Machine
No ratings yet
Research and Implementation of Machine
6 pages
Improving Time-Complexity of K Nearest Neighbors Classifier: A Systematic Review
No ratings yet
Improving Time-Complexity of K Nearest Neighbors Classifier: A Systematic Review
6 pages
Machine Learining 111
No ratings yet
Machine Learining 111
9 pages
KMEANS
No ratings yet
KMEANS
9 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
22 pages
Adaptive Learning-Based K-Nearest Neighbor Classifiers With Resilience To Class Imbalance
No ratings yet
Adaptive Learning-Based K-Nearest Neighbor Classifiers With Resilience To Class Imbalance
17 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
An Improved K-Nearest Neighbor Algorithm and Its Application To High Resolution Remote Sensing Image Classification
No ratings yet
An Improved K-Nearest Neighbor Algorithm and Its Application To High Resolution Remote Sensing Image Classification
4 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
Dac 3488
No ratings yet
Dac 3488
11 pages
A Dempster-Shafer Theory: K-Nearest Neighbor Classification Rule Based On
No ratings yet
A Dempster-Shafer Theory: K-Nearest Neighbor Classification Rule Based On
30 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
32 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
2 K-Nearest Neighbors: ( (X, Y, Y) Be The Set of Ob-X (X) R Y (Y) R
No ratings yet
2 K-Nearest Neighbors: ( (X, Y, Y) Be The Set of Ob-X (X) R Y (Y) R
2 pages
An Experimental Survey of Simple K-Nearest Neighbour Condensing and Editing Algorithms
No ratings yet
An Experimental Survey of Simple K-Nearest Neighbour Condensing and Editing Algorithms
8 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
48 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
23 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
19 pages
An Introduction To Data Mining IIT Bombay
No ratings yet
An Introduction To Data Mining IIT Bombay
48 pages
Different Paradigms of Pattern Recognition
No ratings yet
Different Paradigms of Pattern Recognition
8 pages
A Review of Various KNN Techniques
No ratings yet
A Review of Various KNN Techniques
6 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
Parts Manual SK750 - SK755 (053-2566)
No ratings yet
Parts Manual SK750 - SK755 (053-2566)
207 pages
6 IJAEST Volume No 2 Issue No 2 Representative Based Method of Categorical Data Clustering 152 156
No ratings yet
6 IJAEST Volume No 2 Issue No 2 Representative Based Method of Categorical Data Clustering 152 156
5 pages
Hackers Toeic
No ratings yet
Hackers Toeic
21 pages
MKNN Modified K Nearest Neighbor
No ratings yet
MKNN Modified K Nearest Neighbor
4 pages
Os Practical
No ratings yet
Os Practical
23 pages
IBM Tivoli Monitoring Exploring
No ratings yet
IBM Tivoli Monitoring Exploring
172 pages
PDI Demo
No ratings yet
PDI Demo
6 pages
Back Pressure and Pressure Relief Valves: Dimensions and Port Configurations
No ratings yet
Back Pressure and Pressure Relief Valves: Dimensions and Port Configurations
3 pages
The MESA and ISA 95
No ratings yet
The MESA and ISA 95
9 pages
Information Systems Today: Chapter # 5
No ratings yet
Information Systems Today: Chapter # 5
32 pages
Continuously Reinforced Concrete Pavement
100% (1)
Continuously Reinforced Concrete Pavement
2 pages
Manual Hiad 6 Ton Inv. 1942
No ratings yet
Manual Hiad 6 Ton Inv. 1942
46 pages
API - Pipeline Fact Sheet - RV8
No ratings yet
API - Pipeline Fact Sheet - RV8
1 page
API ISCAN-LITE Scanner
No ratings yet
API ISCAN-LITE Scanner
4 pages
Copyright Project
No ratings yet
Copyright Project
11 pages
6FM9Y
No ratings yet
6FM9Y
2 pages
11 em Acc Public MLM
No ratings yet
11 em Acc Public MLM
11 pages
Control Panel: Need Help?
No ratings yet
Control Panel: Need Help?
12 pages
AEM 7130 Dynamic Optimization and Computational Methods From Cornell (2018)
No ratings yet
AEM 7130 Dynamic Optimization and Computational Methods From Cornell (2018)
8 pages
Grasselli NSL Fish
No ratings yet
Grasselli NSL Fish
2 pages
Nibha Dubey
No ratings yet
Nibha Dubey
5 pages
19 Arid 3235 LAB (5,6,7,8)
No ratings yet
19 Arid 3235 LAB (5,6,7,8)
11 pages
BeatProfiler Multimodal in Vitro Analysis of Cardiac Function Enables Machine Learning Classification of Diseases and Drugs
No ratings yet
BeatProfiler Multimodal in Vitro Analysis of Cardiac Function Enables Machine Learning Classification of Diseases and Drugs
12 pages
SPM Project Report
No ratings yet
SPM Project Report
25 pages
Normal Probability Distribution
No ratings yet
Normal Probability Distribution
15 pages
IEC 61850 Process Bus
No ratings yet
IEC 61850 Process Bus
3 pages
IO List
No ratings yet
IO List
2 pages
Smart Traffic Management Project
No ratings yet
Smart Traffic Management Project
2 pages
1 Write The Java Program For Grading System
No ratings yet
1 Write The Java Program For Grading System
5 pages
Cse2012 PPS3 w2022
No ratings yet
Cse2012 PPS3 w2022
3 pages
STR-W6753: Universal-Input/58 W Off-Line Quasi-Resonant Flyback Switching Regulator
No ratings yet
STR-W6753: Universal-Input/58 W Off-Line Quasi-Resonant Flyback Switching Regulator
8 pages
Wpq-105-03 Gmaw 3g Jose A. Rivas
No ratings yet
Wpq-105-03 Gmaw 3g Jose A. Rivas
1 page
Finite Elements and Approximation
From Everand
Finite Elements and Approximation
O. C. Zienkiewicz
4.5/5 (4)