A Novel Template Reduction Approach For The - Nearest Neighbor Method
A Novel Template Reduction Approach For The - Nearest Neighbor Method
5, MAY 2009
[9] H. Lin and L. Li, Large-margin thresholded ensembles for ordinal re- A Novel Template Reduction Approach for
gression: Theory and practice, in Proc. 17th Int. Conf. Algorithmic the -Nearest Neighbor Method
Learn. Theory, 2006, pp. 319333.
[10] S. Kramer, G. Widmer, B. Pfahringer, and M. DeGroeve, Prediction
Hatem A. Fayed and Amir F. Atiya
of ordinal classes using regression trees, Fundamenta Informaticae,
vol. 47, pp. 113, 2001.
[11] S. Har-Peled, D. Roth, and D. Zimak, Constraint classification: A new
approach to multiclass classification and ranking, Neural Inf. Process. AbstractThe K -nearest neighbor (KNN) rule is one of the most widely
Syst., vol. 15, pp. 785792, 2002. used pattern classification algorithms. For large data sets, the computa-
[12] R. Herbrich, T. Graepel, and K. Obermayer, Lrage margin rank tional demands for classifying patterns using KNN can be prohibitive. A
boundaries for ordinal regression, in Advances in Large Margin way to alleviate this problem is through the condensing approach. This
Classifiers. Cambridge, MA: MIT Press, 2000, pp. 115132. means we remove patterns that are more of a computational burden but
[13] P. McCullagh and J. A. Nelder, Generalized Linear Models. London, do not contribute to better classification accuracy. In this brief, we propose
U.K.: Chapman & Hall, 1983. a new condensing algorithm. The proposed idea is based on defining the
[14] P. McCullagh, Regression models for ordinal data, J. Roy. Statist. so-called chain. This is a sequence of nearest neighbors from alternating
Soc. B, vol. 42, no. 2, pp. 109142, 1980. classes. We make the point that patterns further down the chain are close
[15] V. E. Johnson and J. H. Albert, Ordinal Data Modeling (Statistics for to the classification boundary and based on that we set a cutoff for the pat-
Social Science and Public Policy). New York: Springer-Verlag, 1999. terns we keep in the training set. Experiments show that the proposed ap-
[16] A. Shashua and A. Levin, Ranking with large margin principle: Two proach effectively reduces the number of prototypes while maintaining the
approaches, Neural Inf. Process. Syst., vol. 15, pp. 937944, 2002. same level of classification accuracy as the traditional KNN. Moreover, it is
[17] H. Yu, J. Yang, and J. Han, Classifying large data sets using SVMs a simple and a fast condensing algorithm.
with hierarchical clusters, in Proc. 9th ACM SIGKDD Int. Conf.
Knowl. Disc. Data Mining, 2003, pp. 306315. Index TermsCondensing, cross validation, editing, K -nearest neighbor
[18] D. Boley and D. Cao, Training support vector machine using adap- (KNN), template reduction.
tive clustering, in Proc. 4th SIAM Int. Conf. Data Mining, 2004, pp.
126137.
[19] J. Wang, X. Wu, and C. Zhang, Support vector machines based on I. INTRODUCTION
k -means clustering for real-time business intelligence systems, Int. J.
Business Intell. Data Mining, vol. 1, no. 1, pp. 5464, 2005. K
The -nearest neighbor (KNN) classification rule is one of the
[20] J. Yuan, J. Li, and B. Zhang, Learning concepts from large scale im- most well-known and widely used nonparametric pattern classification
balanced data sets using support cluster machines, in Proc. 14th Annu. methods. Its simplicity and effectiveness have led it to be widely used
ACM Int. Conf. Multimedia, 2006, pp. 441450. in a large number of classification problems, including handwritten
[21] M. Almeida, A. Braga, and J. Braga, SVM-KM: Speeding SVMs
learning with a priori cluster selection and k -means, in Proc. 6th digits, satellite image scenes, and medical diagnosis [1][5]. For KNN,
Brazilian Symp. Neural Netw., 2000, pp. 162167. however, two major outstanding problems are yet to be resolved by
[22] H. Yu, J. Yang, J. Han, and X. Li, Making svms scalable to large data the research community. The first issue is the selection of the best K
sets using hierarchical cluster indexing, Data Mining Knowl. Disc., (number of neighbors to consider), as this problem is greatly affected
vol. 11, no. 3, pp. 295321, 2005.
[23] Z. Xu, K. Yu, V. Tresp, X. Xu, and J. Wang, Representative sampling
by the finite sample nature of the problem. The second issue is the
for text classification using support vector machines, in Proc. 25th computational and the storage issue. The traditional KNN rule requires
Eur. Conf. Inf. Retrieval Res., 2003, pp. 393407. the storage of the whole training set which may be an excessive amount
[24] K. Zhang and J. T. Kwok, Block-quantized kernel matrix for fast spec- of storage for large data sets and leads to a large computation time
tral embedding, in Proc. 23rd Int. Conf. Mach. Learn., 2006, vol. 23, in the classification stage. There are two well-known procedures for
pp. 10971104.
[25] R. Bellman, Introduction to Matrix Analysis, 2nd ed. Philadelphia, reducing the number of prototypes (sometimes referred to as template
PA: SIAM, 1997. reduction techniques). The first approach, called editing, processes
[26] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Sil- the training set with the aim of increasing generalization capabilities.
verman, and A. Y. Wu, An efficient k -means clustering algorithm: This is accomplished by removing prototypes that contribute to the
Analysis and implementation, IEEE Trans. Pattern Anal. Mach. In-
tell., vol. 24, no. 7, pp. 881892, Jul. 2002.
misclassification rate, for example, removing outlier patterns or
[27] I. S. Dhillon, Y. Guan, and B. Kulis, Weighted graph cuts without removing patterns that are surrounded mostly by others of different
eigenvectors: A multilevel approach, IEEE Trans. Pattern Anal. Mach. classes [6][8]. The second approach is called condensing. The aim
Intell., vol. 29, no. 11, pp. 19441957, Nov. 2007. of this approach is to obtain a small template that is a subset of the
[28] W. Waegeman, B. Baets, and L. Boullart, Roc analysis in ordinal re- training set without changing the nearest neighbor decision boundary
gression learning, Pattern Recognit. Lett., vol. 29, pp. 19, 2008.
[29] J. Rennie and N. Srebro, Loss functions for preference levels: Regres- substantially. The idea is that the patterns near the decision boundary
sion with discrete, ordered labels, in Proc. IJCAI Multidisciplinary are crucial to the KNN decision, but those far from the boundary
Workshop Adv. Preference Handling, 2005, pp. 180186. do not affect the decision. Therefore, a systematic removal of these
ineffective patterns helps to reduce the computation time. This can be
established by reducing the number of prototypes that are centered in
dense areas of the same class [9][20]. In this brief, we consider only
Manuscript received August 24, 2007; revised December 21, 2008; accepted
March 11, 2009. First published April 21, 2009; current version published May
01, 2009.
H. A. Fayed is with the Department of Engineering Mathematics and Physics,
Cairo University, Cairo 12613, Egypt (e-mail: [email protected]).
A. F. Atiya is with the Department of Computer Engineering, Cairo Univer-
sity, Cairo 12613, Egypt (e-mail: [email protected]).
Color versions of one or more of the figures in this brief are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TNN.2009.2018547
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009 891
the condensing approach. Below is a short summary of some existing [19], and 6) incorporation of both the proximity between patterns and
algorithms for the condensing approach. geometrical distribution around the given pattern [20]. There are other
In 1968, Hart [9] was the first to propose an algorithm for reducing methods that combine both editing and condensing techniques forming
the size of the stored data for the nearest neighbor decision (the algo- a hybrid model [21].
rithm is called CNN). Hart defined a consistent subset of the data as one In this brief, we introduce a new condensing algorithm, namely, the
that classifies the remaining data correctly with the nearest neighbor template reduction for KNN (TRKNN). The basic idea is to define a
rule. He built this consistent set by sequentially adding to it data points chain of nearest neighbors. By setting a cutoff value for the distances
from the training set as long as the added data point is misclassified among the chain, we effectively separate the patterns into the selected
(using the 1-NN rule). By construction, the resulting reduced subset condensed set (probably consisting of patterns near the decision
classifies all the training data correctly. Empirical results have shown boundary) and the removed set (probably interior patterns). The paper
that Harts CNN rule considerably reduces the size of the training set is organized as follows. The proposed TRKNN method is described
at the expense of minimal or even no degradation in classification per- in Section II. Some analytical insights are introduced in Section III.
formance. The drawback of CNN is that frequently it may keep some Then the proposed method is validated experimentally in Section IV.
points that are far from the decision boundary. To combat this, in 1972, Results are discussed in Section V. Finally, conclusions are drawn in
Gates [10] proposed what he called the reduced nearest neighbor rule Section VI.
(RNN). This method is based on first applying CNN and then per-
forming a postprocessing step. In this postprocessing step, the data II. TEMPLATE REDUCTION FOR KNN
points in the consistent set are revisited and removed if their deletion The goal of the proposed approach is to discard prototypes that are
does not result in misclassifying any point in the training set. Exper- far from the boundaries and have little influence on the KNN clas-
imental results confirmed that RNN yields a slightly smaller training sification. To establish this, we first introduce the so-called nearest
subset than that obtained with CNN. In [11], Bhattacharya et al. (1992) neighbor chain. This is simply a sequence of the nearest neighbors
proposed two methods, one based on the Voronoi graph and the other from alternating classes. Consider first the following definitions. Con-
based on the Gabriel graph. The methods have the merit that they are sider a pattern xi (also call it xi0 and let it be from class !m ). Let
exact and yield sets independent of the order in which the data are pro- xi1 NN(xi0 ) denote the nearest neighbor to xi0 that is from a dif-
cessed. The method based on a Voronoi graph yields a condensed set ferent class. Similarly, let xi2 NN(xi1 ) denote the nearest neighbor
which is both training-set consistent (i.e., it classifies all the training to xi1 that is from the starting class (!m ). We continue in this manner,
data correctly) and decision-boundary consistent (i.e., it determines ex- with xi;j +1 NN(xij ). This sequence of xij s (whose class mem-
actly the same decision boundary as that of the entire training set). berships alternates between class !m and the other classes) constitutes
However, it suffers from a large complexity due to the need to con- the nearest neighbor chain. Below is the precise definition.
struct the Voronoi diagram. On the other hand, the method based on Definition: A nearest neighbor chain Ci of a pattern xi (of Class
the Gabriel diagram is faster but it is neither decision-boundary con- !m ) is defined as the sequence xi0 ; xi1 ; xi2 ; . . . ; xik and the sequence
sistent nor training-set consistent. In [12], Wilson and Martinez (2000) di0 ; di1 ; . . . ; di;k01 where the root pattern xi0 = xi , and xij is the
closest pattern to xi;j 01 (of a class different from !m if j is odd and
of Class !m if j is even). Moreover, dij = kxi;j +1 0 xij jj2 is the Eu-
presented five algorithms for reducing the size of case bases: DROP1,
DROP2, . . ., DROP5. Decremental reduction optimization procedure 1
(DROP1) is the basic removal scheme based on so-called associate pat- clidean distance between patterns xi;j +1 and xij . The chain is stopped
terns. The associate patterns for some pattern p are the patterns which at xik if xi;k+1 = xi;k01 . Note that the distance sequence is a nonin-
creasing sequence (i.e., dij di;j +1 ).
have p as one of their K -nearest neighbors. The removal of p is deter-
Fig. 1 shows some examples of constructed chains for a two-class
mined based on its effect on the classification of its associates. DROP2
problem. In summary, a chain is constructed as follows. Start from
is a modification whereby the order of the patterns to be removed is
a pattern xi . Find the nearest neighbor from a different class. Then,
selected according to a certain distance criterion in a way to remove
from that pattern find the nearest neighbor from the starting class. Con-
patterns furthest from the decision boundary first. DROP2 also dif-
tinue in this manner until we end up with two patterns that are nearest
fers from DROP1 in that deletion decisions still rely on the original
neighbor to each other. Note that by construction the distances between
set of associates. DROP3, DROP4, and DROP5 are versions whereby
the patterns in the chain form a nonincreasing sequence. Note also that
noise-filtering pass is performed prior to applying the DROP2 proce-
patterns downstream the chain will probably be close to the classifica-
dure. In [13], Mollineda et al. (2002) obtained a condensed 1-NN clas- tion boundary, because they will have smaller distances from the pat-
sifier by merging the same class nearest clusters as long as the set of terns of different classes. This provides the basis of the proposed con-
new representatives correctly classify all the original patterns. In [14], densing procedure.
Wu et al. (2002) proposed an efficient method to reduce the training set The basic idea of the proposed condensing approach is as follows.
required for KNN while maintaining the same level of classification ac- For each pattern xi in the training set, we construct its corresponding
curacy, namely, the improved KNN (IKNN). This is implemented by chain Ci . The pattern xij in the chain is dropped (from the selected
iteratively eliminating patterns, which exhibit high attractive capacities condensed set) if dij > 1 di;j +1 where is a threshold >1, and
(the attractive capacity sy of a pattern y is defined as the number of pat- j = 0; 2; 4; . . . up to the size of the chain. Note that we allow only
terns from class C (y ), which are closer to pattern y than other patterns patterns from the same class as that of xi to be eliminated (i.e., we
belonging to other classes). The algorithm filters out a large portion consider only the even patterns in the chain). This is important when
of prototypes that are unlikely to match against the unknown pattern. dealing with a multiclass problem as the chain is constructed using the
This accelerates the classification procedure considerably, especially in one-against-all concept as has been illustrated earlier. Typically, when
cases where the dimensionality of the feature space is high. Other ap- starting the chain with an interior point, the distance to the next point
proaches for condensing are based on: 1) evolutionary algorithms and in the chain will be large. As the chain converges onto the boundary
decision trees [15], 2) space partitioning [16], 3) decision boundary points, the distances decrease in value and will more or less level off.
preservation [17], 4) estimation of the distribution of representatives This gives a rationale for the proposed cutoff procedure. Because there
according to the information they convey [18], 5) a gradient-descent is a significant decrease in distances, the considered pattern is deemed
technique for learning prototype positions and local metric weights to be probably an interior point and can be discarded, whereas if the
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.
892 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009
distances do not decrease too much, then we are probably oscillating TABLE I
around the classification boundary, and the pattern is kept. Below are SUMMARY OF DATA SETS
the precise steps of the condensing method.
Algorithm: TRKNN
Inputs:
Training set
.
Distance ratio threshold .
Output:
Reduced training set
.
shortcut. Note also that the condensed set does not depend on some pat-
Method: tern presentation order, like many condensing algorithms. The reason is
that at each step the full training set is considered as a whole. Another
For each pattern xi in
comment that applies also to most condensing methods is that after
Find its corresponding chain Ci condensing it is imperative to have the number of neighbors K (for the
KNN classification) a little reduced than K used with the whole data
End For set. This is due to the somewhat redundant patterns being removed.
For each chain Ci This reduction will be bigger if more data is removed.
If dij > 1 di;j +1 then mark the pattern xij The proposed algorithm relies on the concept that a pattern near the
classification boundary will tend to have a nearest neighbor (from an-
End For other class) that is fairly close. On the other hand for an interior pattern
that distance will tend to be larger. Based on this concept, we discard
End For
the latter patterns. While this concept is fairly intuitive, it would be
Drop all marked patterns from
. beneficial to provide some analytical analysis in order to gain insight
and to understand the degree and the factors affecting that relationship.
Below we develop some approximate analysis.
Fig. 1 shows an example that illustrates the working of the algorithm. For simplicity, consider a two-class case. Consider a pattern x0 from
The closest pattern of different class to pattern x1 is x11 and the dis- class !1 , and let the dimension of the pattern vector be D . Moreover,
tance between them is d11 . Similarly, the closest pattern of different let p(xj!2 ) denote the class conditional density for class !2 , and let
class to pattern x11 is x12 and the distance between them is d12 . Now there be N training patterns from class !2 . Finally, let NN(x0 ) denote
x1 is dropped if d11 > 1 d12 . the nearest neighbor to pattern x0 that is from class !2 . Let r be the
Note that some computational savings can be achieved by observing distance from x0 to NN(x0 ). Define the following random variable:
that some chains encompass some other smaller chains. By operating
on the larger chains first, we automatically get the condensing results
of the smaller chains contained in them, leading to some computational
= p(xj!2 )dx:
kx0x k r
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009 893
TABLE II
SUMMARY OF AVERAGE TRAINING TIME AND TEST TIME IN SECONDS
TABLE III
MEAN NUMBER OF PROTOTYPES (NPROT) AND MEAN TEST ERROR RATES (ERR) OVER THE DIFFERENT FOLDS.
STANDARD DEVIATIONS ARE SHOWN IN BRACKETS
It is well known (see [22] and [23]) that such a variable obeys the fol- to have a relative cutoff point. That is, we discard patterns based on
lowing beta density function: comparing them according to the successive distances in the chain (i.e.,
when dij > 1 di;j +1 ).
p( ) = N (1 0 )N01 ; 0 1:
Assume that the number of training patterns from class !2 is suffi- IV. EXPERIMENTAL RESULTS
ciently large such that the NN(x0 ) is close to x0 . Hence, within the To validate the proposed algorithm, we compared it with the tradi-
ball centered around x0 stretching out to NN(x0 ), the density p(xj!2 ) tional KNN and some of the condensing methods: the DROP2 [12] and
can be approximated as almost constant, and hence the IKNN [14] for several real-world data sets. Note that, as we focus on
c 1 rD p(x0 j!2 )
comparing condensing methods, we do not employ DROP3, DROP4,
or DROP5 [12], as these just add preprocessing steps that can be applied
where the term c 1 rD represents the volume of a D -dimensional ball
to any method. On the other hand, DROP1s accuracy is significantly
of radius r , with c D=2 =0(1 + D=2). Then, we get
low compared to KNN and DROP2. Therefore, to attain a fair compar-
ison, DROP2 is included in the comparison. We use the fivefold vali-
1=D
dation procedure for the purpose of tuning the key parameters of each
r :
c 1 p(x0 j!2 ) method. In this approach, the training set is partitioned into five equal
parts. Training is performed on four parts and validated on the fifth part.
The expectation of r is then given by Then the validation part is rotated and training is performed again. The
1 1=D process is repeated five times, and the validation classification error
N (1 0 )N01
E (r) = d
c 1 p(x0 j!2 )
on all five parts is combined. The parameter values that yield the best
0 validation results will then be chosen for testing the performance. The
which can be evaluated as tuned parameters are as follows. In all methods, suggested values for K
1=D are odd values from 1 to 9. For IKNN, suggested values for the attrac-
p(x0 j!2 )01=D :
0(N + 1)0(1 + 1=D )(0(1 + D=2))
E (r) = p0(N + 1 + 1=D) tive capacity threshold (S ) are: [0:01; 0:05; 0:1]2Nmin , where Nmin is
the minimum number of patterns corresponding to the same class while
The previous relation confirms the fact that the distance to the nearest suggested values for the portion function are: (t) = 2 (t + 1)00:5 ,
neighbor from the other class is small if we are close to the classifica- where 2 f0:1; 0:2g (for more details and description of the param-
tion boundary (where the opposing class-conditional density p(xj!2 ) eters, see [14]). For TRKNN, suggested values for are 1.2, 1.4, and
would be high). Conversely, that distance would be large if p(xj!2 ) 1.6. The distance metric used is the Euclidean distance for all methods.
was small (signifying that we are in the interior and far away from the Concerning the DROP2 method, there are no tunable parameters. There
boundary). Moreover, that monotone relationship decays more slowly are four main performance measures. The training time represents the
for large dimensional problems. One might contemplate the situation time it takes to condense the training set, including searching for the
when we are near the boundary but p(xj!2 ) is still small. This situ- optimal parameters, such as K and the others. (However, it is com-
ation arises when the other class conditional density p(xj!1 ) is also puted as the average time per tested parameter set. This way we avoid
small, that is, we operate in a relatively sparse area in the space. To penalizing methods that have a larger number of parameters, such as
compensate for that, the algorithm uses the other distances in the chain the competing IKNN, or have finer parameter grid.) The testing time
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.
894 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009
TABLE IV
COMBINED 5 2 2 CROSS-VALIDATION F TEST FOR THE TEST CLASSIFICATION ERROR (ERR) AND THE NUMBER OF PROTOTYPES (NPROT).
REJECTION DECISION OF THE NULL HYPOTHESIS IS SHOWN IN BRACKETS
Fig. 2. Test classification error versus the number of prototypes over the 5 22 Fig. 3. Test classification error versus the number of prototypes over the 5 22
folds for the Pima Indians data set.
folds for the Breast Cancer data set.
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009 895
Fig. 4. Test classification error versus the number of prototypes over the 5 22 Fig. 5. Test classification error versus the number of prototypes over the 5 22
folds for the Balance Scale data set. folds for the Landsat data set.
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.
896 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 20, NO. 5, MAY 2009
This happens while the test errors for DROP2 and TRKNN are compa- [6] D. L. Wilson, Asymptotic properties of nearest neighbor rules using
rable. One can see the clear separation in the NPROT dimension, while edited data, IEEE Trans. Syst. Man Cybern., vol. SMC-2, no. 3, pp.
in the test error dimension the data overlap. It seems that possibly for 408420, Jul. 1972.
[7] P. A. Devijver and J. Kittler, On the edited nearest neighbor rule, in
larger data sets DROP2 does not prune out enough points (which para- Proc. 5th Int. Conf. Pattern Recognit., Miami, FL, 1980, pp. 7280.
doxically are the type of problems where we need to drop points the [8] F. J. Ferri and E. Vidal, Colour image segmentation and labeling
most). through multiedit-condensing, Pattern Recognit. Lett., vol. 13, pp.
Concerning the training time (Table II), the time of TRKNN is con- 561568, 1992.
[9] P. E. Hart, The condensed nearest neighbor rule, IEEE Trans. Inf.
siderably shorter than that of IKNN, by a factor of around 2 or 3 times. Theory, vol. IT-14, no. 3, pp. 515516, May 1968.
This is because the training time of TRKNN is dominated by the com- [10] W. Gates, The reduced nearest neighbor rule, IEEE Trans. Inf.
putation of the distance matrix (the matrix that holds distances between Theory, vol. IT-18, no. 3, pp. 431433, May 1972.
pairs of all training patterns) whose complexity is O(n2 ), where n is [11] B. K. Bhattacharya, R. S. Poulsen, and G. T. Toussaint, Application of
the training set size. This distance matrix is computed only once at proximity graphs to editing nearest neighbor decision rules, in Proc.
16th Symp. Interface Between Comput. Sci. Statist., 1984, pp. 97108.
the beginning of the training process. For IKNN, besides the compu- [12] D. R. Wilson and T. R. Martinez, Reduction techniques for instance-
tation of the distance matrix, there is an extra computation at each it- based learning algorithms, Mach. Learn., vol. 38, no. 3, pp. 257286,
eration for the evaluation and sorting of the attractive capacities with 2000.
complexity O(nA log A), where A is the average attractive capacity. [13] R. A. Mollineda, F. J. Ferri, and E. Vidal, An efficient prototype
For large data sets, this attractive capacity A could be rather large. This
merging strategy for the condensed 1-NN rule through class-con-
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 6, 2009 at 11:00 from IEEE Xplore. Restrictions apply.