0% found this document useful (0 votes)

32 views16 pages

Discriminant Pattern Recognition Using Transformation Invariant Neurons

This document describes research on developing discriminant models for pattern recognition using transformation invariant neurons. The researchers propose an algorithm called TD-Neuron that uses a gradient descent approach to develop discriminant models based on one-sided tangent distance. They compare the performance of their TD-Neuron algorithm to HSS algorithms (which use SVD to generate non-discriminant models) and LVQ algorithms on a handwritten digit recognition task, finding that the discriminant TD-Neuron models achieve better classification accuracy and rejection rates.

Uploaded by

sasddasda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views16 pages

Discriminant Pattern Recognition Using Transformation Invariant Neurons

Uploaded by

sasddasda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/12384400

Discriminant Pattern Recognition Using Transformation Invariant Neurons

Article in Neural Computation · July 2000

DOI: 10.1162/089976600300015402 · Source: PubMed

CITATIONS READS
12 107

3 authors, including:

Diego Sona
Fondazione Bruno Kesser & Istituto Italiano di Tecnologia
117 PUBLICATIONS 759 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Deep Learning for Analyzing Magnetic Resonance Images of Multiple Sclerosis View project

RENVISION EU FP7 View project

All content following this page was uploaded by Diego Sona on 06 June 2014.

The user has requested enhancement of the downloaded file.

Discriminant Pattern Recognition Using
Transformation Invariant Neurons
Diego Sona Alessandro Sperduti Antonina Starita

Dipartimento di Informatica, Università di Pisa

Corso Italia, 40, 56125, Pisa, Italy
e-mail: {sona,perso,starita}@di.unipi.it

Abstract
To overcome the problem of invariant pattern recognition Simard et
al. proposed a successful nearest-neighbor approach based on tangent
distance, attaining state-of-the-art accuracy. Since this approach needs
great computational and memory effort, Hastie et al. proposed an al-
gorithm (HSS) based on Singular Value Decomposition (SVD), for the
generation of non-discriminant tangent models.
In this paper we propose a different approach, based on a gradient
descent constructive algorithm called TD-Neuron, that develops discrim-
inant models. We present as well comparative results of our constructive
algorithm versus HSS and LVQ algorithms. Specifically, we tested the
HSS algorithm using both the original version based on the two-sided
tangent distance, and a new version based on the one-sided tangent
distance. Empirical results over the NIST-3 database show that the
TD-Neuron is superior to both SVD and LVQ based algorithms, since it
reaches a better trade-off between error and rejection.

1 Introduction
In several pattern recognition systems the principal and most desired feature is the
robustness against transformations of patterns. Simard et al. (Simard, LeCun, and
Denker, 1993) partially solved this problem by proposing the tangent distance as
a classification function invariant to small transformations. They used the con-
cept in a nearest neighbor algorithm, achieving state-of-the-art accuracy on isolated
handwritted character recognition. However, this approach has a quite high com-
putational complexity, due to the large number of Euclidean and tangent distances
that need to be calculated.
Different researchers have shown how such complexity can be reduced at the cost
of increased space complexity. Simard (Simard, 1994) proposed a filtering method
based on multi-resolution and on a hierarchy of distances, while Sperduti and Stork
(Sperduti and Stork, 1995) devised a graph based method for rapid and accurate
search through prototypes.
Different approaches to the problem, aiming at the reduction of the classifica-
tion time and space requirements, while trying to preserve the same accuracy, were

1
studied by some authors. Specifically, Hastie et al. (Hastie, Simard, and Säckinger,
1995) developed rich models for representing large subsets of the prototypes through
a Singular Value Decomposition (SVD) based algorithm, while Schwenk & Milgram
(Schwenk and Milgram, 1995b) proposed a modular classification system (Diabolo)
based on several auto-associative multi-layer perceptrons, which use tangent dis-
tance as the error reconstruction measure. A different, but related approach, has
been pursued by Hinton et. al (Hinton, Dayan, and Revow, 1997), which propose
two different methods for modeling the manifolds of data. Both methods are based
on locally linear low-dimensional approximations to the underlying data manifolds.
All the above models are non-discriminant 1 . Although non-discriminant models
have some advantages over discriminant models, as discussed in (Hinton, Dayan, and
Revow, 1997), the amount of computation during recognition is usually higher for
non-discriminant models, especially if a good trade-off between error and rejection
is required. On the other side, discriminant models take more time to be trained. In
several applications, however, it is more convenient to spend extra time for training,
which is usually performed only once or a few times, so to have a faster recognition
process, which is repeated millions of times. In this cases, discriminant models
should be preferred.
In this paper, we discuss a constructive algorithm for the generation of discrimi-
nant2 tangent models. The proposed algorithm, which is an improved version of the
algorithm previously presented in (Sona, Sperduti, and Starita, 1997), is based on
the definition of the TD-Neuron (TD stands for Tangent Distance) where the net
input is computed by using the one-sided tangent distance instead of the standard
dot product. Using this definition we have devised a constructive algorithm which
we compare here with HSS and LVQ algorithms. In particular, we report results ob-
tained for the HSS algorithm using both the original version based on the two-sided
tangent distance, and a new version based on the one-sided tangent distance. For
the sake of comparison, we present also the results of the LVQ2.1 algorithm, which
resulted to be the best among the LVQ algorithms.
The one-sided version of the HSS algorithm was derived in order to have a fair
comparison against the TD-Neuron, which exploits the one-sided tangent distance.
Empirical results over the NIST-3 database of handwritten digits show that the
TD-Neuron is superior to both HSS algorithms and LVQ algorithms since it reaches
a better trade-off between error and rejection. More surprisingly, our results show
that the one-sided version of the HSS algorithm is superior to the two-sided version,
which performs poorly when introducing a rejection class. An additional advantage
of the proposed algorithm is the constructive approach.
The paper is organized as follows. In Section 2 and 3 we give an overview of
tangent distance and tangent distance models, respectively. In the same section we
define a novel version of the HSS algorithm, based on one-sided tangent distance. A
new formulation for discriminant tangent distance models is proposed in Section 4,
while the proposed TD-Neuron model is presented in Section 5, which includes also
details on the training algorithm. Comparative empirical results on a handwritten
digit recognition task between our algorithm, HSS algorithms, LVQ and nearest
neighbor algorithms with Euclidean distance are presented in Section 6. Finally, a
discussion of the results and conclusions are reported in Section 7.
1 Schwenk & Milgram proposed a discriminant version of Diablo (Schwenk and Milgram, 1995a)
as well.
2 In the sense that the model for each class is generated taking into account also negative

examples, i.e., examples of patterns belonging to the other classes.

2
2 Tangent Distance Overview
Let consider a pattern recognition problem where invariance for a set of n different
transformations is required. Given an image X i , the function X i (θ) is a manifold
of at most n dimensions, representing the set of patterns that can be obtained by
transforming the original image through the chosen transformations, where θ is
the amount of transformations and X i = X i (0). The ideal would be to use the
transformation-invariant distance

DI (X i , X j ) = min kX i (α) − X j (θ)k.

α,θ

However, the formalization of the manifold equation and, in particular, the com-
putation of the distance between the two manifolds, is very hard. For this reason,
Simard et al. (Simard, LeCun, and Denker, 1993) proposed an approach based on
the local linear approximation of the manifold by
n
X
X̃ i (θ) = X i + T jXi θj ,
j=1

where T jXi are n different tangent vectors at the point X i (0), which can easily
be computed by finite difference. The distance between the two manifolds is then
approximated by the so called tangent distance (Simard, LeCun, and Denker, 1993):

DT (X i , X j ) = min kX̃ i (α) − X̃ j (θ)k. (1)

α,θ

Of course, the approximation is accurate only for local transformations, however, in

character recognition problems, global invariance may not be desired, since it can
cause confusion between patterns such as “n” and “u”.
The tangent distance defined by equation (1) is called two-sided tangent distance,
since it is computed between two subspaces. There exists also a less computational
expensive version called one-sided tangent distance (Schwenk and Milgram, 1995a),
where the distance is computed between a subspace and a pattern in the following
way
DT1-sided (X i , X j ) = min kX̃ i (α) − X j k. (2)
α

3 Tangent Distance Models

The main drawback of tangent distance is its high computational requirement, if
compared with Euclidean distance. For this reason several authors tried to devise
compact models, based on tangent distance, able to summarize relevant information
conveyed by a set of patterns.
Specifically, to address this problem, Hastie et al. (Hastie, Simard, and Säckinger,
1995) proposed an algorithm for the generation of rich models representing large
subsets of patterns.
Given a set of patterns {X 1 , . . . , X NC } of class C, Hastie et al. (Hastie, Simard,
and Säckinger, 1995) proposed the tangent subspace model,
n
X
M (θ) = W + T i θi ,
i=1

3
where W is the centroid and the set {T i } constitutes the associated invariant sub-
space of dimension n.
According to this definition, for each class C, the model M C can be computed
as
NC
X
M C = arg min min kM (θ p ) − X p (αp )k2 , (3)
M p=1 θ p αp
minimizing the error function over W and T i .
The above definition constitutes a difficult optimization problem, which however
can be solved for a fixed value of n (i.e., the subspace dimension) by an iterative
algorithm based on Singular Value Decomposition, proposed by Hastie et al. (Hastie,
Simard, and Säckinger, 1995).
Note that, if the problem is formulated using the one-sided tangent distance,
then equation (3) becomes
NC
X
M C = arg min min kM (θ p ) − X p k2 , (4)
M p=1 θ p

which can be easily solved by principal component analysis theory, also called
Karhunen-Loéve Expansion. In fact, equation (4) can be minimized by choosing
W as the average over all available samples X p , and T i as the most representative
eigenvectors (principal components) of the covariance matrix Σ, where
NC
1 X
Σ= (X p − W )(X p − W )T .
NC p=1

In the following, we will refer to the two versions of the algorithms as HSS, and
when necessary we will specify which one is used (one-sided or two-sided).
It must be observed that, by construction, the HSS algorithms return non-
discriminant models. In fact, they use only the evidence provided by positive ex-
amples of the target class.
Moreover, the two-sided HSS algorithm can be used only if a priori knowledge
on invariant transformations is present. If this knowledge is not present, the intro-
duction of invariance with respect to an arbitrary transformation can be risky, since
this can remove information relevant for the classification task. In this situation
it is preferable to use the one-sided version which does not commit to any specific
transformation.

4 A General Formulation
As discussed in the introduction, there are good reasons for using discriminant
models. Although Schwenk & Milgram suggested how to modify the learning rule
of Diablo to obtain discriminant models, they never proposed a formalization of
discriminant models using tangent distance. In this section we present a general
formulation which allows the user to develop discriminant or non-discriminant tan-
gent models.
To be able to devise discriminant models, equation (3) must be modified in such
a way to take into account that all available data must be used during the generation
process. The basic idea is to define a model for class C which minimizes the tangent
distances from patterns belonging to C, and maximizes the tangent distances from

4
patterns not in C (i.e., in C). Mathematically this can be expressed, for each class
C, by  
NC NC
X X
C C
M C = arg min  DT (M , X p ) − λ DT (M , X p ) , (5)
M p=1 p=1

where M is the generic model {W , T i , . . .,T n }, NC is the number of patterns X C

p
belonging to the class C, and N C is the number of patterns X Cp not belonging to
C.
Note that, the second sum is multiplied by a constant λ, which identifies how
much discriminant the model should be. If λ = 0 equation (5) becomes equal to
equation (3) (or (4) when considering the 1-sided tangent distance). On the other
hand, if λ is large, the resulting model may not be a good descriptive model for
class C. In any case, no bounded solution to equation (5) may exist if the term
associated with λ is not bounded.

5 TD-Neuron
The TD-Neuron (abbreviation for Tangent Distance Neuron) is so called since it
can be considered as a neural computational unit which computes (as net input)
the square of the one-sided tangent distance of the input vector X k from a prototype
model defined by a set of internal parameters (weights).
Specifically, it is characterized by a set of n + 1 vectors, of the same dimension
as the input vectors. One vector (W ) is used as reference vector (centroid), while
the remaining vectors {T 1 , . . . , T n } are used as tangent vectors. Moreover, the set
of tangent vectors constitutes an orthonormal basis.
This set of parameters is organized in such a way to form a tangent model.
Formally, the net input of a TD-Neuron for a pattern k is

netk = min kM (θ) − X k k2 + β, (6)

θ
where β is the offset. A good model should return small net input for patterns
belonging to the learned class.
Since the tangent vectors constitute an orthonormal basis, equation (6) can
exactly and easily be computed by using the projections of the input vector over
the model subspace (see Figure 1):

Xn
netk = k X k − W k2 − [(X k − W )t Ti ]2 + β
| {z }
i=1
dk
(7)
n
X
= dtk dk − [ dtk T i ]2 + β,
| {z }
i=1
γik

where, for the sake of notation, d k denotes the difference between the input pattern
X k and the centroid W , and the projection of dk over the i-th tangent vector is
denoted by γik .
Note that, the right side of equation (7) mainly involves dot products, just as in
a standard neuron.

5
X
1-sided
d DT
θ2 T2
W

θ1

Figure 1: Geometric interpretation of equation (7). Note that W and T i span the

invariance manifold, d is the Euclidean distance between the pattern X and the

centroid W , and net = (DT1-sided )2 is the one-sided tangent distance.

The output of the TD-Neuron is then computed by transforming the net through
a nonlinear monotone function f . In our experiments, we have used the symmetric
sigmoidal function
2
ok = − 1. (8)
1 + enetk
We have used a monotonic decreasing function, so that the output corresponding to
patterns belonging to the target class will be close to 1.

5.1 Training the TD-Neuron

A discriminant model based on the TD-Neuron can be obtained by adapting equa-
tion (5).
Given a training set {(X 1 , t1 ), . . . , (X N , tN )}, where

1 if X i ∈ C
ti =
−1 if X i ∈ C

is the i-th desired output for the TD-Neuron, and N = N C +NC is the total number
of patterns in the training set, an error function can be defined as
N
1X
E= (tk − ok )2 , (9)
2
k=1

where ok is the output of the TD-Neuron for the k-th input pattern.
Equation (9) can be written into a form similar to equation (5) by making explicit
the target values ti , by splitting the sum in such a way to group together patterns

6
belonging to C and patterns belonging to C, and by weighting the negative examples
with a constant λ ≥ 0:
 
NC NC
1 X X
E= (1 − ok )2 + λ (1 + oj )2  (10)
2 j=1
k=1

NC
In our experiments we have chosen λ = NC , thus balancing the strength of
patterns belonging to C and those belonging to C.
Using equations (7-8), it is trivial to compute the changes for the centroid, the
tangent vectors, and the offset, by using a gradient descent approach over equa-
tion (10):

N
" n
!#
∂E X X
∆W = −η = −2 η (tk − ok ) fk0 dk − γik T i (11)
∂W i=1
k=1
N
∂E X
∆T i = −η = −2 η [(tk − ok ) fk0 γik dk ] (12)
∂T i
k=1
N
∂E X
∆β = −ηβ = ηβ [(tk − ok ) fk0 ] (13)
∂β
k=1

∂ok
where η and ηβ are learning parameters, and f k0 = ∂net k
.
Before training the TD-Neuron by gradient descent, however, the tangent sub-
space dimension must be decided. To solve this problem we developed a constructive
algorithm which adds tangent vectors one by one, according to the computational
needs. This idea is also justified by the observation that using equations (11-13)
leads to the sequential convergence of the tangent vectors according to their relative
importance.
This means that, in first approximation, all the tangent vectors remain random
vectors while the centroid converges first. Then one of the tangent vectors converges
to the most relevant transformation (while the remaining tangent vectors are still
immature), and so on till all the tangent vectors converge, one by one, to less and
less relevant transformations.
This behavior suggests starting the training using only the centroid (i.e., without
tangent vectors) and then adding tangent vectors as needed. Under this learning
scheme, since there are no tangents when the centroid is computed, equation (11)
becomes
N
∂E X
∆W = −η = −2η [(tk − ok ) fk0 dk ]. (14)
∂W
k=1

The constructive algorithm is composed by two phases (see Table 1). First there
is the centroid computation, based on the iterative use of equations (13) and (14).
Then the centroid is frozen, and one by one all tangent vectors T i are trained
using equations (12) and (13). At each iteration in the learning phase the tangent
vector T i must be orthonormalized with respect to the already computed tangent
vectors3 . If after a fixed number of iterations (we have used 300 iterations) the
total error variation is less then a fixed threshold (0.01%), and no change in the
3 The computational complexity of the orthonormalization is linear in the number of already

computed tangent vectors.

7
CONSTRUCTIVE ALGORITHM

Initialize the centroid W

Update β and W by equations (13) and (14) till they converge

Freeze W

REPEAT

Initialize a new tangent vector T i

Update T i and β with equations (12) and (13), and ortho-

normalize T i with respect to {T 1 , . . . , T i−1 } till

it converges

Freeze T i

UNTIL new T i gives little accuracy changes

Table 1: The constructive algorithm for the TD-Neuron.

classification performance over training set occurs, a new tangent vector is added.
The tangent vectors are iteratively added till changes in the classification accuracy
become irrelevant.
The initialization of internal vectors of the TD-Neuron can be done in many
different ways. On the basis of empirical evidence we have concluded that the
learning phase of the centroid can considerably be reduced by initializing the centroid
with the mean value of the patterns belonging to the positive class. We have also
devised a “good” initialization algorithm for tangent vectors (see Table 2) which
tries to minimize the drop in the net input for all the patterns due to the increase
in the tangent subspace dimension (see Figure 2). This is obtained by introducing a
new tangent vector which mainly spans the residual subspace between the patterns
in the positive class and the current model. In this way, patterns which are in the
negative class will only be mildly affected by the new introduced tangent vector.
We have also devised a “better” initialization algorithm based on principal com-
ponents of difference vectors for all classes. However, the observed training speed-up
is not justified by the additional computational overhead due to SVD computation
needed at each tangent vector insertion.
In our experiments, we have also used a simple form of regularization over the
parameters (weight decay with penalty equal to 0.997, i.e., all parameters are mul-
tiplied by the penalty before adding the gradient), obtaining a better convergence.

6 Results
In order to obtain comparable results, we had to use the same amount of parameters
for all algorithms. In particular, for tangent based algorithms the number of vectors
is given by the number of tangent vectors incremented by 1 (the centroid). For this

8
Insertion of 1st tangent vector

Insertion of 2nd tangent vector

Total Error
100

10
0 10000 20000 30000 40000
Iterations

Figure 2: Total error variation during learning phase for pattern ‘0’. At each new

tangent insertion there is a reduction of the distance between patterns and model

(also for patterns belonging to class C). This affects the output of the neuron for

all patterns, increasing the total output squared error.

TANGENT VECTOR INITIALIZATION

• for each class c ∈ (C ∪ C) compute the mean value of differences

1
PNc
between patterns and model: dc = Nc p=1 (X p −M (θ));

• orthonormalize the vector d c of the class C with respect to the

mean values of differences of all other classes belonging to C, and

return it as the new initial tangent vector.

Table 2: Initialization procedure for the tangent vectors.

reason, the LVQ algorithms are compared versus tangent distance based algorithms
using a number of reference vectors equal to the number of tangent vectors plus 1.
Furthermore, with the two-sided HSS algorithm we have to consider also the tangent
vectors corresponding to the input patterns, specifically, we have used 6 transfor-
mations (tangent vectors) for each input pattern: clockwise and counterclockwise
rotations, and translation in the four cardinal directions 4 .
We have tested our constructive algorithm versus the two versions of HSS al-
gorithm and LVQ algorithms in the LVQ PAK package (Kohonen, Hynninen, Kan-
4 We preferred to approximate the exact tangents by finite differences, since in this way the local

shape of the manifold is interpolated in a better way.

9
97

95 HSS 1-sided
TDN
HSS 2-sided
93 LVQ-2.1
1-NN

performance
91 97

96.5
89
96
87
95.5
85
95
5 6 7 8 9 10 11 12 13 14 15
83
0 2 4 6 8 10 12 14
number of tangents

Figure 3: The results obtained on the test set by models generated by both versions

of HSS, LVQ2.1, TD-Neuron and 1-NN with Euclidean distance.

gas, Laaksonen, and Torkkola, 1996) (optimized-learning-rate LVQ1, original LVQ1,

LVQ2.1, LVQ3), using 10704 binary digits taken from the NIST-3 dataset. The bi-
nary 128x128 digits were transformed into 64-grey level 16x16 images by a simple
local counting procedure 5 . The only preprocessing transformation performed was
the elimination of empty borders.
The training set consisted of 5000 randomly chosen digits, while the remaining
digits were used in the test set. For each tangent distance algorithm, a single
tangent model for each class of digit was computed. With the LVQ algorithms, a
set of reference vectors was used for each class; the number of reference vectors was
chosen so to have as many parameters as in the tangent distance algorithms. In
particular, we have tested all algorithms based on tangent distance using different
number of vectors for each experiment, starting from 1 vector per class (centroid
without tangent vectors) up to 16 vectors per class (centroid plus 15 tangent vectors).
The number of reference vectors for LVQ algorithms has been chosen accordingly.
Concerning LVQ algorithms, here we just report the results obtained by using
LVQ2.1 with 1-NN based on Euclidean distance as classification rule, since this algo-
rithm reached the best performance over an extended set of experiments, involving
LVQ algorithms, with different settings for the learning parameters.
The classification of the test digits was performed using the label of the clos-
est model for HSS, the 1-NN rule for LVQ algorithms and the highest output for
the TD-Neuron algorithm. For the sake of comparison, we have also performed a
classification using the Nearest Neighbor rule (1-NN) with the Euclidean distance
as classification metric. In this case we have classified each pattern looking at the
label of the nearest vector in the learning set.
In Figure 3 we have reported the results obtained on the test set for different
numbers of tangent vectors for all models. In particular, the best classification result
5 The original image is partitioned into 16x16 windows and the number of pixel with value equal

to 1 is used as the grey value for the corresponding pixel in the new image.

10
(96.84%) was given by 1-NN with Euclidean distance, followed by the two-sided HSS
with 9 tangent vectors6 (96.6%). With the same amount of parameters (15 tangent
vectors) the TD-Neuron and the one-sided HSS gave a performance rate of 96.51%
and 96.42% respectively. Finally, LVQ2.1 obtained a performance rate of 96.48%
using 15 vectors per class.
From these results it can be noted that the two-sided HSS algorithm does over-
fit the data after the 9th tangent vector, while this is not true for the remaining
algorithms. Nevertheless, all the models reach a similar performance with the same
amount of parameters, which is slightly below the performance attained by the 1-
NN classifier using Euclidean distance. However, both the tangent models and LVQ
algorithms have the advantage of being less demanding both in space and response
time than 1-NN.
It is interesting that the recognition performance of TD-Neuron system is ba-
sically monotone in the number of parameters (tangent vectors), while all other
methods present over-fitting and high variance in the results.
Although from Figure 3 it seems that the tangent models and LVQ2.1 are equiv-
alent, when introducing a rejection criterion the model generated by the TD-Neuron
outperforms the other algorithms (see Figure 4). We used the same rejection crite-
rion for all algorithms: a pattern is rejected when the difference between the first
and the second best outputs belonging to different classes is less than a fixed thresh-
old7 . Furthermore, introducing the rejection criterion leads also to the surprising
result that the one-sided version of HSS performs better than the two-sided version.
In order to assess whether the better performance exhibited by the TD-Neuron
was due to the specific rejection criterion or to the discriminant capability of the
model, we performed some experiments using a different rejection criterion: reject
when the best output value is smaller (TD-Neuron) or greater (HSS, LVQ and
1-NN) than a threshold. In Figure 5 we have reported the curves obtained for
the best models, i.e., TD-Neuron and 1-sided HSS, reporting, for comparison, the
corresponding curves of Figure 4.
These results demonstrate that the improvement shown by the TD-Neuron is not
tight to a specific rejection criterion. Moreover, removing the sigmoidal function
from the TD-Neuron during the test phase, the rejection curves does not change
significantly for both rejection criteria. Thus we can conclude that the improvement
of the TD-Neuron is mainly due to the discriminant training procedure.
In Figure 6, four examples of TD-Neuron models are reported: in the leftmost
column the centroids of patterns ‘0’, ‘1’, ‘2’ and ‘3’ are shown. The remaining
columns contain the first (and most important) four tangent vectors for each model.

7 Discussion and Conclusion

We introduced the tangent distance neuron (TD-Neuron), which implements the
one-sided version of the tangent distance, and gave a constructive learning algorithm
for building a tangent subspace with discriminant capabilities.
As stated in the introduction, there are many advantages in using the proposed
computational model versus the HSS model and LVQ algorithms. Specifically, we
believe that the proposed approach is particularly useful in those applications where
it is very important to have a classification system which is both discriminant and
fast in recognition.
6 There are also 6 tangent vectors corresponding to the input pattern.
7 Obviously, the threshold is different for each algorithm.

11
90 60
1-sided HSS with 15 tangents
2-sided HSS with 9 tangents
80 50 2-sided HSS with 15 tangents
TD_Neuron with 9 tangents
TD_Neuron with 15 tangents
70 1-NN with Euclidean distance
40
LVQ-2.1 with 15 vectors
60
30

% rejection
50
20
40

30 10

20 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
10

0
0 0.5 1 1.5 2 2.5 3 3.5 4
% error

Figure 4: Error-rejection curves for all algorithms: 1-sided HSS with 15 tangent

vectors, 2-sided HSS with 9 and 15 tangent vectors, TD-Neuron with 9 and 15 tan-

gent vectors, LVQ2.1 with 15 reference vectors and 1-NN with Euclidean distance.

The boxed diagram shows a detail of the curves demonstrating that the TD-Neuron

with 15 tangent vectors has the best trade-off between error and rejection.

In this paper we also compared the TD-Neuron constructive algorithm versus

two different versions of the HSS algorithm, the LVQ2.1 algorithm and the 1-NN
classification criterion. The obtained results over the NIST-3 database of handwrit-
ten digits show that the TD-Neuron is superior to the HSS algorithms based on
Singular Value Decomposition and the LVQ algorithms, since it reaches a better
trade-off between error and rejection. Moreover, we have assessed that the better
trade-off is mainly due to the discriminant capabilities of our model.
Concerning the proposed neural formulation, we believe that the nonlinear sig-
moidal transformation is useful since by removing it and using the inverted target
(o−1
k (tk )) for training, it would drastically reduce the size of the space of solutions
in the weight space. In fact, very high or very low values for the net input would not
minimize the error function with the inverted target, while being fully acceptable in
the proposed model. Moreover, the nonlinearity increases the stability of learning
because of the saturated output.
It must be pointed out that, during the model generation, for a fixed number
of tangent vectors, the HSS algorithm is faster than our, because it needs only
a fraction of the training examples (only one class). However, our algorithm is
remarkably more efficient than HSS algorithms when a family of tangent models,
with an increasing number of tangent vectors, must be generated. Also the LVQ
algorithms are faster than TD-Neuron, but they have as drawback a poor rejection
performance.
An additional advantage of the TD-Neuron model is that, because the training

12
60
TD_Neuron with 15 tangents - (diff)
TD_Neuron with 15 tangents - (abs)
TD_Neuron with 15 tangents - (linear) (diff)
50 TD_Neuron with 15 tangents - (linear) (abs)
1-sided HSS with 15 tangents - (diff)
1-sided HSS with 15 tangents - (abs)
40

% rejection 30

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
% error

Figure 5: Detail of error-rejection curves of 1-sided HSS and TD-Neuron for two

different rejection criteria: threshold on the output absolute value (abs) and thresh-

old on the difference between the best and the second best output models (diff ). The

curves for the TD-Neuron model with removed non-linear output is individuated by

the additional label (linear).

algorithm is based on a gradient descent technique, several TD-Neurons can be

arranged to form an hidden layer in a feed-forward network with standard output
neurons, which can be trained by a trivial extension of back-propagation. This may
lead to a remarkable increase in the transformation invariant features of the system.
Furthermore, it should be possible to easily extract information from the network
regarding the most important features used during classification (see Figure 6).

References
Hastie, T., Simard, P. Y., and Säckinger, E., 1995. Learning Prototype Models for
Tangent Distance. In Advances in Neural Information Processing Systems, eds.
G. Teasauro, D. S. Touretzky, and T. K. Leen, vol. 7, pp. 999–1006. Cambridge
MA: MIT Press.
Hinton, G. E., Dayan, P., and Revow, M., 1997. Modeling the Manifold of Images
of Handwritten Digits. IEEE Transactions on Neural Networks 8, no. 1:65–74.
Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J., and Torkkola, K.,
1996. LVQ PAK: The Learning Vector Quantization Program Package. Tech-
nical Report A30, Helsinki University of Technology, Laboratory of Com-
puter and Information Science, Rakentajanaukio 2 C, SF-02150 Espoo, Finland.
Http://www.cis.hut.fi/nnrc/nnrc-programs.html.

13
Figure 6: Tangent models obtained by the TD-Neuron for digits ‘0’, ‘1’, ‘2’ and

‘3’. The centroids are shown in the leftmost column, while the remaining columns

show the first four tangent vectors.

Schwenk, H. and Milgram, M., 1995a. Learning Discriminant Tangent Models for
Handwritten Character Recognition. In International Conference on Artificial
Neural Networks, pp. 985–988. Springer-Verlag.
Schwenk, H. and Milgram, M., 1995b. Transformation Invariant Autoassociation
with Application to Handwritten Character Recognition. In Advances in Neural
Information Processing Systems, eds. G. Teasauro, D. S. Touretzky, and T. K.
Leen, vol. 7, pp. 991–998. Cambridge MA: MIT Press.
Simard, P. Y., 1994. Efficient Computation of Complex Distance Metrics Using
Hierarchical Filtering. In Advances in Neural Information Processing Systems,
eds. J. D. Cowan, G. Teasauro, and J. Alspector, vol. 6, pp. 168–175. San Francisco
CA: Morgan Kaufmann.
Simard, P. Y., LeCun, Y., and Denker, J., 1993. Efficient Pattern Recognition Using
a New Transformation Distance. In Advances in Neural Information Processing
Systems, eds. S. J. Hanson, J. D. Cowan, and C. L. Giles, vol. 5, pp. 50–58. San
Mateo CA: Morgan Kaufmann.
Sona, D., Sperduti, A., and Starita, A., 1997. A Constructive Learning Algorithm
for Discriminant Tangent Models. In Advances in Neural Information Processing

14
Systems, eds. M. C. Mozer, M. I. Jordan, and T. Petsche, vol. 9, pp. 786–792.
Cambridge MA: MIT Press.
Sperduti, A. and Stork, D. G., 1995. A rapid graph-based method for arbitrary
transformation-invariant pattern classification. In Advances in Neural Informa-
tion Processing Systems, eds. G. Teasauro, D. S. Touretzky, and T. K. Leen,
vol. 7, pp. 665–672. Cambridge MA: MIT Press.

View publication stats

Use and Maintenance Manual: Pneumatic L Sealer
No ratings yet
Use and Maintenance Manual: Pneumatic L Sealer
20 pages
CRISP-DM Template Final Project
No ratings yet
CRISP-DM Template Final Project
13 pages
General Assembly Resolution (Amendments)
100% (10)
General Assembly Resolution (Amendments)
3 pages
Petition - Notarial Commission - Template
No ratings yet
Petition - Notarial Commission - Template
5 pages
Effect of Tax Avoidance and Tax Evasion
No ratings yet
Effect of Tax Avoidance and Tax Evasion
13 pages
Chap.5 FINANCIAL ASSET Valuation
No ratings yet
Chap.5 FINANCIAL ASSET Valuation
39 pages
Intro To Cement Laboratory Test
No ratings yet
Intro To Cement Laboratory Test
49 pages
Tangent Prop and Manifold Tangent Classifier Are B
No ratings yet
Tangent Prop and Manifold Tangent Classifier Are B
4 pages
Pattern Recognitionand Neural Networks
No ratings yet
Pattern Recognitionand Neural Networks
12 pages
Smoke Control System in High Rise Building
No ratings yet
Smoke Control System in High Rise Building
8 pages
9600-0630 Issue 1.1.1.4 EN-1DBC+ and EN-2DBC - Installation Guide
No ratings yet
9600-0630 Issue 1.1.1.4 EN-1DBC+ and EN-2DBC - Installation Guide
13 pages
Advances in Character Recognition by Ding X. (Ed.)
No ratings yet
Advances in Character Recognition by Ding X. (Ed.)
240 pages
Machine Learning Tut
No ratings yet
Machine Learning Tut
68 pages
DL2108T02
No ratings yet
DL2108T02
10 pages
Wcma Application Note
No ratings yet
Wcma Application Note
103 pages
Ind Hstry 202313jun
No ratings yet
Ind Hstry 202313jun
80 pages
A Comparative Study On Handwriting Digit Recognition Using Neural Networks
No ratings yet
A Comparative Study On Handwriting Digit Recognition Using Neural Networks
5 pages
LST-1198 Decom &amp Xfrto Morroco 7-9-84
No ratings yet
LST-1198 Decom &amp Xfrto Morroco 7-9-84
30 pages
Pattern Recognition
No ratings yet
Pattern Recognition
52 pages
Greedy Pruning For Continually Adapting Networks
No ratings yet
Greedy Pruning For Continually Adapting Networks
60 pages
CH 3 Multithreading
No ratings yet
CH 3 Multithreading
54 pages
Large Margin Classification Using The Perceptron Algorithm: Machine Learning, 37 (3) :277-296, 1999
No ratings yet
Large Margin Classification Using The Perceptron Algorithm: Machine Learning, 37 (3) :277-296, 1999
19 pages
Support Vector Network
No ratings yet
Support Vector Network
25 pages
3.1 Type of IT Infrastructure Services: Introduction To Information Technology Infrastructure (ICT554)
No ratings yet
3.1 Type of IT Infrastructure Services: Introduction To Information Technology Infrastructure (ICT554)
70 pages
Gradient-Based Learning Applied To Document Recognition
No ratings yet
Gradient-Based Learning Applied To Document Recognition
47 pages
LITERA03Z - Week 7
No ratings yet
LITERA03Z - Week 7
42 pages
Adaptive Deep Supervised Autoencoder Based Image R PDF
No ratings yet
Adaptive Deep Supervised Autoencoder Based Image R PDF
15 pages
A State-Of-The-Art Computer Vision Adopting Non - E
No ratings yet
A State-Of-The-Art Computer Vision Adopting Non - E
33 pages
230623-Paper-Learning and Representing Object Shape Through An Array of Orientation Columns
No ratings yet
230623-Paper-Learning and Representing Object Shape Through An Array of Orientation Columns
14 pages
Wireless Networks
No ratings yet
Wireless Networks
5 pages
Layer 2
No ratings yet
Layer 2
8 pages
2023 IEEE TNNLS A Survey On Evolutionary Neural Architecture Search
No ratings yet
2023 IEEE TNNLS A Survey On Evolutionary Neural Architecture Search
21 pages
Master Server Log
No ratings yet
Master Server Log
35 pages
Recognition of Text in Textual Images Using Deep Learning
No ratings yet
Recognition of Text in Textual Images Using Deep Learning
14 pages
A Continual Learning Survey Defying Forgetting in Classification Tasks
No ratings yet
A Continual Learning Survey Defying Forgetting in Classification Tasks
20 pages
Adaptive Classifier Construction An Approach To Ha
No ratings yet
Adaptive Classifier Construction An Approach To Ha
9 pages
Trzcinski PAMI Preprint
No ratings yet
Trzcinski PAMI Preprint
14 pages
Get Peugeot 307 Service and Repair Manual Models covered Peugeot 307 Hatchback Estate SW including special limited editions 2001 2008 Y to 58 reg Petrol Diesel Martynn Randall PDF ebook with Full Chapters Now
100% (1)
Get Peugeot 307 Service and Repair Manual Models covered Peugeot 307 Hatchback Estate SW including special limited editions 2001 2008 Y to 58 reg Petrol Diesel Martynn Randall PDF ebook with Full Chapters Now
51 pages
Semi Active PDF
No ratings yet
Semi Active PDF
8 pages
Hyper Sausage Coverage Function Neuron Model and Learning A - 2023 - Pattern Rec
No ratings yet
Hyper Sausage Coverage Function Neuron Model and Learning A - 2023 - Pattern Rec
13 pages
Multi Distance Metric Network For Few Shot Learning: Farong Gao Lijie Cai Zhangyi Yang Shiji Song Cheng Wu
No ratings yet
Multi Distance Metric Network For Few Shot Learning: Farong Gao Lijie Cai Zhangyi Yang Shiji Song Cheng Wu
12 pages
Tort and Contract
No ratings yet
Tort and Contract
13 pages
Self-Supervised Contrastive Representation Learning For Semi-Supervised Time-Series Classification
No ratings yet
Self-Supervised Contrastive Representation Learning For Semi-Supervised Time-Series Classification
15 pages
Semi-Supervised Learning With Deep Generative
No ratings yet
Semi-Supervised Learning With Deep Generative
9 pages
Similarity-Based Learning Via Data Driven Embeddings
No ratings yet
Similarity-Based Learning Via Data Driven Embeddings
14 pages
Application of Data Augmentation On Deep Learning
No ratings yet
Application of Data Augmentation On Deep Learning
13 pages
Use Case Lookup
No ratings yet
Use Case Lookup
17 pages
Unsupervised Embedding Learning Via Invariant and Spreading Instance Feature
No ratings yet
Unsupervised Embedding Learning Via Invariant and Spreading Instance Feature
11 pages
Implementation and Analysis of Different Digit Recognition Methods On Reduced MNIST Dataset
No ratings yet
Implementation and Analysis of Different Digit Recognition Methods On Reduced MNIST Dataset
10 pages
Geometric Deep Learning On Graphs and Manifolds Using Mixture Model Cnns
No ratings yet
Geometric Deep Learning On Graphs and Manifolds Using Mixture Model Cnns
13 pages
Feature Transformers A Unified Representation Learning Framework For Lifelong Learning
No ratings yet
Feature Transformers A Unified Representation Learning Framework For Lifelong Learning
11 pages
L D S M: Earning EEP Tructured Odels
No ratings yet
L D S M: Earning EEP Tructured Odels
11 pages
Semiparametric M-Estimation With Overparameterized Neural Networks
No ratings yet
Semiparametric M-Estimation With Overparameterized Neural Networks
33 pages
Unsupervised Embedding Learning Via Invariant and Spreading Instance Feature
No ratings yet
Unsupervised Embedding Learning Via Invariant and Spreading Instance Feature
10 pages
Shayna Parker Resume 2018
No ratings yet
Shayna Parker Resume 2018
2 pages
What Should Not Be Contrastive in Contrastive Learning
No ratings yet
What Should Not Be Contrastive in Contrastive Learning
13 pages
Handwritten Digit Recognition Using Machine Learning
No ratings yet
Handwritten Digit Recognition Using Machine Learning
5 pages
Park DeepSDF Learning Continuous Signed Distance Functions For Shape Representation CVPR 2019 Paper
No ratings yet
Park DeepSDF Learning Continuous Signed Distance Functions For Shape Representation CVPR 2019 Paper
10 pages
Marginal Deep Architecture Stacking Feature Learning Modules To Build Deep Learning Models
No ratings yet
Marginal Deep Architecture Stacking Feature Learning Modules To Build Deep Learning Models
14 pages
Weakly Supervised Contrastive Learning
No ratings yet
Weakly Supervised Contrastive Learning
10 pages
Table of Content
No ratings yet
Table of Content
7 pages
NeurIPS 2021 Redesigning The Transformer Architecture With Insights From Multi Particle Dynamical Systems Paper
No ratings yet
NeurIPS 2021 Redesigning The Transformer Architecture With Insights From Multi Particle Dynamical Systems Paper
14 pages
Contrastive Learning
No ratings yet
Contrastive Learning
10 pages
Recognition of Handwritten Digits Using Machine Learning Techniques IJERTV6IS050456 PDF
No ratings yet
Recognition of Handwritten Digits Using Machine Learning Techniques IJERTV6IS050456 PDF
4 pages
Hindi Handwritten Character Recognition Using Deep Learning
No ratings yet
Hindi Handwritten Character Recognition Using Deep Learning
7 pages
Learning A Similarity Metric Discriminatively, With Application To Face Verification
No ratings yet
Learning A Similarity Metric Discriminatively, With Application To Face Verification
8 pages
CAT Bootcamp
No ratings yet
CAT Bootcamp
8 pages
Paper 4
No ratings yet
Paper 4
12 pages
1 s2.0 S0169023X24000090 Main
No ratings yet
1 s2.0 S0169023X24000090 Main
17 pages
Combining Neural Gas and Learning Vector Quantization For Cursive Character Recognition
No ratings yet
Combining Neural Gas and Learning Vector Quantization For Cursive Character Recognition
13 pages
Zhu 2018
No ratings yet
Zhu 2018
8 pages
NILRNN: A Neocortex Inspired Locally Recurrent Neural Network For Unsupervised Feature Learning in Sequential Data
No ratings yet
NILRNN: A Neocortex Inspired Locally Recurrent Neural Network For Unsupervised Feature Learning in Sequential Data
17 pages
Self Supervised Learning
No ratings yet
Self Supervised Learning
5 pages
Dai 2015
No ratings yet
Dai 2015
10 pages
Assignment of Trademark
No ratings yet
Assignment of Trademark
3 pages
Analysis of Conventional Feature Learning
No ratings yet
Analysis of Conventional Feature Learning
12 pages
Deshidratador Serie MDQ
No ratings yet
Deshidratador Serie MDQ
4 pages
Local Link Portlaoise To Roscrea Bus Timetable
No ratings yet
Local Link Portlaoise To Roscrea Bus Timetable
2 pages
PR Assignment 01 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 01 - Seemal Ajaz (206979)
7 pages
Silt Measuring Instrument
No ratings yet
Silt Measuring Instrument
6 pages
PR Slide Spring 2017
No ratings yet
PR Slide Spring 2017
5 pages
Image Classification Based On Transfer Learning of CNN
No ratings yet
Image Classification Based On Transfer Learning of CNN
5 pages
Nips Ws 2017
No ratings yet
Nips Ws 2017
12 pages
Leave Application Form (New)
No ratings yet
Leave Application Form (New)
1 page
Jurnal Referensi
No ratings yet
Jurnal Referensi
2 pages
Cutting of Cement Bags by Manually JSA HSE Professionals
No ratings yet
Cutting of Cement Bags by Manually JSA HSE Professionals
1 page