0% found this document useful (0 votes)

4 views9 pages

On The Equality of Kernel AdaTron and Sequential M

Uploaded by

abdellatifelouissari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views9 pages

On The Equality of Kernel AdaTron and Sequential M

Uploaded by

abdellatifelouissari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/221166069

On the Equality of Kernel AdaTron and Sequential Minimal Optimization in

Classiﬁcation and Regression Tasks and Alike Algorithms for Kernel

Conference Paper · January 2003

Source: DBLP

CITATIONS READS

31 229

3 authors, including:

Vojislav Kecman
Virginia Commonwealth University
129 PUBLICATIONS 5,178 CITATIONS

SEE PROFILE

All content following this page was uploaded by Vojislav Kecman on 21 May 2014.

The user has requested enhancement of the downloaded file.

th
Proceedings of 11 European Symposium on Artificial Neural Networks, pp. 215-222, ESANN 2003, Bruges, Belgium, 2003

On the Equality of Kernel AdaTron and Sequen-

tial Minimal Optimization in Classification and
Regression Tasks and Alike Algorithms for Ker-
nel Machines

Vojislav Kecman1, Michael Vogt2, Te Ming Huang1

1
School of Engineering, The University of Auckland, Auckland, New Zealand
2
Institute of Automatic Control, TU Darmstadt, Darmstadt,, Germany
e-mail: [email protected], [email protected]

Abstract: The paper presents the equality of a kernel AdaTron (KA) method
(originating from a gradient ascent learning approach) and sequential minimal
optimization (SMO) learning algorithm (based on an analytic quadratic pro-
gramming step) in designing the support vector machines (SVMs) having posi-
tive definite kernels. The conditions of the equality of two methods are estab-
lished. The equality is valid for both the nonlinear classification and the nonlin-
ear regression tasks, and it sheds a new light to these seemingly different learn-
ing approaches. The paper also introduces other learning techniques related to
the two mentioned approaches, such as the nonnegative conjugate gradient, clas-
sic Gauss-Seidel (GS) coordinate ascent procedure and its derivative known as
the successive over-relaxation (SOR) algorithm as a viable and usually faster
training algorithms for performing nonlinear classification and regression tasks.
The convergence theorem for these related iterative algorithms is proven.

1. Introduction

One of the mainstream research fields in learning from empirical data by support vec-
tor machines, and solving both the classification and the regression problems, is an
implementation of the incremental learning schemes when the training data set is
huge. Among several candidates that avoid the use of standard quadratic program-
ming (QP) solvers, the two learning approaches which have recently got the attention
are the KA (Anlauf, Biehl, 1989; Frieß, Cristianini, Campbell, 1998; Veropoulos,
2001) and the SMO (Platt, 1998, 1999; Vogt, 2002). Due to its analytical foundation
the SMO approach is particularly popular and at the moment the widest used, ana-
lyzed and still heavily developing algorithm. At the same time, the KA although pro-
viding similar results in solving classification problems (in terms of both the accuracy
and the training computation time required) did not attract that many devotees. There
are two basic reasons for that. First, until recently (Veropoulos, 2001), the KA seemed
to be restricted to the classification problems only and second, it 'lacked' the fleur of
the strong theory (despite its beautiful 'simplicity' and strong convergence proofs).
The KA is based on a gradient ascent technique and this fact might have also dis-
tracted some researchers being aware of problems with gradient ascent approaches
faced with possibly ill-conditioned kernel matrix. Here we show when and why the
recently developed algorithms for SMO using positive definite kernels or models
without a bias term, (Vogt, 2002), and the KA for both classification (Friess, Cristi-
anini, Campbell, 1998) and regression (Veropoulos, 2001) are identical. Both the KA
and the SMO algorithm attempt to solve the following QP problem in the case of
classification (Vapnik, 1995; Cherkassky and Mullier, 1998; Cristianini and Shawe-
Taylor, 2000; Kecman, 2001; Schölkopf and Smola, 2002) - maximize the dual La-
grangian
l
1 l
Ld(α) = ∑ α i − ∑ yi y jα iα j K ( x i , x j ) , (1)
i =1 2 i , j =1
l

subject to αi ≥ 0, i = 1, …, l and ∑α y = 0 .
i i
(2)
i =1

where l is the number of training data pairs, αi are the dual Lagrange variables, yi are
the class labels (±1), and the K (xi , x j ) are the kernel function values. Because of
noise or generic class’ features, there will be an overlapping of training data points.
Nothing, but constraints, in solving (1) changes and they are
l

0 ≤ αi ≤ C, i = 1, ..., l and ∑α y = 0 ,
i i
(3)
i =1

where 0 < C < ∞, is a penalty parameter trading off the size of a margin with a num-
ber of misclassifications.
In the case of the nonlinear regression the learning problem is the maximiza-
tion of a dual Lagrangian below
l l
1 l
Ld(α,α*)= −ε ∑ (α i* + α i ) + ∑ (α i − α i* ) yi − ∑ (α i − α i* )(α j − α *j ) K ( x i , x j ) , (4)
i =1 i =1 2 i , j =1

∑ α *i = ∑i =1 α i ,
l l
s.t. (4a)
i =1
*
0 ≤ α C, i ≤ 0 ≤ αi ≤ C, i = 1, ..., l. (4b)
where ε is a prescribed size of the insensitivity zone, and αi and αi* (i = 1, ..., l) are
Lagrange multipliers for the points above and below the regression function respec-
tively. Learning results in l Lagrange multiplier pairs (α, α*). Because no training
data can be on both sides of the tube, either αi or αi* will be nonzero, i.e., αiαi* = 0.

2. The KA and SMO learning algorithms without-bias-term

It is known that positive definite kernels (such as the most popular and the most
widely used RBF Gaussian kernels as well as the complete polynomial ones) do not
require bias term (Evgeniou, Pontil, Poggio, 2000). Below, the KA and the SMO algo-
rithms will be presented for such a fixed (i.e., no-) bias design problem and compared
for the classification and regression cases. The equality of two learning schemes and
resulting models will be established. Originally, in (Platt, 1998, 1999), the SMO clas-
sification algorithm was developed for solving the problem (1) including the con-
straints related to the bias b. In these early publications the case when bias b is fixed
variable was also mentioned but the detailed analysis of a fixed bias update was not
accomplished.
2.1 Incremental Learning in Classification

a) Kernel AdaTron in classification

The classic AdaTron algorithm as given in (Anlauf and Biehl, 1989) is developed for
linear classifier. The KA is a variant of the classic AdaTron algorithm in the feature
space of SVMs (Frieß et al., 1998). The KA algorithm solves the maximization of the
dual Lagrangian (1) by implementing the gradient ascent algorithm. The update
∆α i of the dual variables αi is given as
∂Ld
∆α i = η
∂α i
( l
)
= η 1 − yi ∑ j =1 α j y j K ( x i , x j ) = η (1 − yi fi ) ,
(5a)

where fi is the value of the decision function f at the point xi, i.e.,
∑
l
fi = j =1
α j y j K ( x i , x j ) , and yi denotes the value of the desired target (or the class'
label) which is either +1 or -1. The update of the dual variables αi is given as
α i ← min(max(0, α i + ∆α i ), C ) (i = 1, ..., l) (5b)
In other words, the dual variables αi are clipped to zero if (α i + ∆α i ) < 0 . In the case
of the soft nonlinear classifier (C < ∞) αi are clipped between zero and C, (0 ≤ αi ≤ C).
The algorithm converges from any initial setting for the Lagrange multipliers αi.

b) SMO without-bias-term in classification

Recently (Vogt, 2002) derived the update rule for multipliers αi that includes a de-
tailed analysis of the Karush-Kuhn-Tucker (KKT) conditions for checking the opti-
mality of the solution. (As referred above, a fixed bias update was mentioned in Platt's
papers). The following update rule for αi for a no-bias SMO algorithm was proposed
y i Ei y f −1 1 − yi f i
∆α i = − =− i i = , (6)
K (xi , xi ) K (xi , xi ) K (xi , xi )
where Ei = fi – yi denotes the difference between the value of the decision function f at
the point xi and the desired target (label) yi. Note the equality of (5a) and (6) when the
learning rate in (5a) is chosen to be η i = 1 / K ( x i , x i ) . The important part of the SMO
algorithm is to check the KKT conditions with precision τ (e.g., τ = 10-3) in each step.
An update is performed only if
α i < C ∧ yi Ei < −τ , or
(6a)
α i > 0 ∧ yi Ei > τ
After an update, the same clipping operation as in (5b) is performed
α i ← min(max(0, α i + ∆α i ), C ) (i = 1, ..., l) (6b)
It is the nonlinear clipping operation in (5b) and in (6b) that strictly equals the KA
and the SMO without-bias-term algorithm in solving nonlinear classification prob-
lems. This fact sheds new light on both algorithms. This equality is not that obvious
in the case of a 'classic' SMO algorithm with bias term due to the heuristics involved
in the selection of active points which should ensure the largest increase of the dual
Lagrangian Ld during the iterative optimization steps.
2.2 Incremental Learning in Regression

Similarly to the case of classification, there is a strict equality between the KA and the
SMO algorithm when positive definite kernels are used for nonlinear regression.

a) Kernel AdaTron in regression

The first extension of the Kernel AdaTron algorithm for regression is presented in
(Veropoulos, 2001) as the following gradient ascent update rules for αi and αi*
∂L
( )
∆α i = ηi d = ηi yi − ε − ∑ j =1 (α j − α *j ) K (x j , xi ) = ηi ( yi − ε − fi ) = −ηi ( Ei + ε ) , (7a)
∂α i
l

∂Ld
= η (− y − ε + ∑ − α ) K (x , x ) ) = η ( − y − ε + f ) = η ( E − ε ) , (7b)
l
∆α i* = ηi (α j *

∂α i*
i i j =1 j j i i i i i i

where yi is the measured value for the input xi, ε is the prescribed insensitivity zone,
and Ei = fi – yi stands for the difference between the regression function f at the point
xi and the desired target value yi at this point. The calculation of the gradient above
does not take into account the geometric reality that no training data can be on both
sides of the tube. In other words, it does not use the fact that either αi or αi* or both
will be nonzero. i.e., that αiαi* = 0 must be fulfilled in each iteration step. Below we
derive the gradients of the dual Lagrangian Ld accounting for geometry. This new
formulation of the KA algorithm strictly equals the SMO method and it is given as
∂Ld
= − K ( xi , xi )α i − ∑ j =1, j ≠i (α j − α *j ) K (x j , xi ) + yi − ε + K ( xi , xi )α i* − K (xi , xi )α i*
l

∂α i
(8a)
= − K (xi , xi )α i* − (α i − α i* ) K ( xi , xi ) − ∑ j =1, j ≠i (α j − α *j ) K ( x j , xi ) + yi − ε
l

= − K (xi , xi )α i* + yi − ε − f i = − ( K ( xi , xi )α i* + Ei + ε )
For the αi* multipliers, the value of the gradient is
∂Ld
*
= − K (xi , xi )α i + Ei − ε . (8b)
∂α i
The update value for αi is now
∂L
∆α i = ηi d = −ηi ( K (xi , xi )α i* + Ei + ε ) , (9a)
∂α i
∂L (9b)
α i ← α i + ∆α i = α i + ηi d = α i − ηi ( K (xi , xi )α i* + Ei + ε )
∂α i
For the learning rate ηi = 1/ K (xi , xi ) the gradient ascent learning KA is defined as,
Ei + ε (10a)
α i ← α i − α i* −
K (xi , xi )
Similarly, the update rule for αi* is
Ei − ε (10b)
α i* ← α i* − α i +
K ( x i , xi )
Same as in the classification, αi and αi* are clipped between zero and C,
α i ← min(max(0, α i ), C ) (i = 1, ..., l) , (11a)
* *
α i ← min(max(0, α i ), C ) (i = 1, ..., l) . (11b)
b) SMO without-bias-term in regression

The first algorithm for the SMO without-bias-term in regression (together with a de-
tailed analysis of the KKT conditions for checking the optimality of the solution) is
derived in (Vogt, 2002). The following learning rules for the Lagrange multipliers αi
and αi* updates were proposed
E +ε , (12a)
α i ← α i − α i* − i
K ( x i , xi )
E −ε . (12b)
α i* ← α i* − α i + i
K ( xi , xi )
The equality of equations (10a, b) and (12a, b) is obvious when the learning rate, as
presented above in (10a, b), is chosen to be ηi = 1/ K (xi , xi ) . Thus, in both the classi-
fication and the regression, the optimal learning rate is not necessarily equal for all
training data pairs. For a Gaussian kernel, η = 1 is same for all data points, and for a
complete nth order polynomial each data point has different learning rate
ηi = 1/(xTi xi + 1) n . Similar to classification, a joint update of αi and αi* is performed
only if the KKT conditions are violated by at least τ , i.e. if
α i < C ∧ ε + Ei < −τ , or
α i > 0 ∧ ε + Ei > τ , or
∗
(13)
α i < C ∧ ε − Ei < −τ , or
∗
α i > 0 ∧ ε − Ei > τ
After the changes, the same clipping operations as defined in (11) are performed
α i ← min(max(0, α i ), C ) (i = 1, ..., l) , (14a)
α i* ← min(max(0, α i* ), C ) (i = 1, ..., l) . (14b)
The KA learning as formulated in this paper and the SMO algorithm without-bias-
term for solving regression tasks are strictly equal in terms of both the number of it-
erations required and the final values of the Lagrange multipliers. The equality is
strict despite the fact that the implementation is slightly different. In every iteration
step, namely, the KA algorithm updates both weights αi and αi* without any checking
whether the KKT conditions are fulfilled or not, while the SMO performs an update
according to equations (13).

3. The Coordinate Ascent Based Learning for Nonlinear

Classification and Regression Tasks
When positive definite kernels are used, the learning problem for both tasks is same.
In a vector-matrix notation, in a dual space, the learning is represented as:
Ld (α ) = −0.5α Kα + f α (15)
T T
maximize
s.t. 0 <= α i <= C, (i = 1, ..., n), (16)
where, in the classification n = l and the matrix K is an (l, l) symmetric positive defi-
nite matrix, while in regression n = 2l and K is a (2l, 2l) symmetric semipositive defi-
nite one. Note that the constraints (16) define a convex subspace over which the con-
vex dual Lagrangian should be maximized. It is very well known that the vector α
may be looked at as the solution of a system of linear equations
Kα = f (17)
subject to the same constraints as given by (16).
Thus, it may seem natural to solve (17), subject to (16), by applying some of
the well known and established techniques for solving a general linear system of
equations. The size of training data set and the constraints (16) eliminate direct tech-
niques. Hence, one has to resort to the iterative approaches in solving the problems
above. There are three possible iterative avenues that can be followed. They are; the
use of the Non-Negative Least Squares (NNLS) technique (Lawson and Hanson,
1974), application of the Non-Negative Conjugate Gradient (NNCG) method (Heste-
nes, 1980) and the implementation of Gauss-Seidel (GS) i.e., the related Successive
Over-Relaxation technique (SOR). The first two methods solve for the non-negative
constraints only. Thus, they are not suitable in solving 'soft' tasks, when penalty pa-
rameter C < ∞ is used, i.e., when there is an upper bound on maximal value of αi.
Nevertheless, in the case of nonlinear regression, one can apply NNLS and NNCG by
taking C = ∞ and compensating (i.e. smoothing or 'softening' the solution) by increas-
ing the sensitivity zone ε. However, the two methods (namely NNLS and NNCG) are
not suitable for solving soft margin (C < ∞ ) classification problems in their present
form, because there is no other parameter that can be used in 'softening' the margin.
Here we show how to extend the application of GS and SOR to both the
nonlinear classification and to the nonlinear regression tasks. The Gauss-Seidel
method solves (17) by using the ith equation to update the ith unknown doing it itera-
tively, i.e., starting in the kth step with the first equation to compute the α1 , then the
k +1

k +1 k +1
second equation is used to calculate the α 2 by using new α 1 and α i (i > 2) and
k

so on. The iterative learning takes the following form,

 i −1 n  1  i −1 n  1 ∂Ld
α k +1 =  fi − ∑ K ijα k +1 − ∑Kα ij
k
 / K ii = α i −
k
 ∑ K ijα j + ∑ K ijα j − f i  = α i +
k +1 k k

K K ii ∂α i
i j j
 j =1 j =i +1  ii  j =1 j = i  k +1

(18)
where we use the fact that the term within a second bracket (called the residual ri in
mathematics' references) is the ith element of the gradient of a dual Lagrangian Ld
given in (15) at the k+1th iteration step. The equation (18) above shows that GS
method is a coordinate gradient ascent procedure as well as the KA and the SMO are.
The KA and SMO for positive definite kernels equal the GS! Note that the optimal
learning rate used in both the KA algorithm and in the SMO without-bias-term ap-
proach is exactly equal to the coefficient 1/Kii in a GS method. Based on this equality,
the convergence theorem for the KA, SMO and GS (i.e., SOR) in solving (15) subject
to constraints (16) can be stated and proved as follows:
Theorem: For SVMs with positive definite kernels, the iterative learning algorithms
KA i.e., SMO i.e., GS i.e., SOR, in solving nonlinear classification and regression
tasks (15) subject to constraints (16), converge starting from any initial choice of α 0.

Proof: The proof is based on the very well known theorem of convergence of the GS
method for symmetric positive definite matrices in solving (17) without constraints
(Ostrowski, 1966). First note that for positive definite kernels, the matrix K created
by terms yi y j K ( x i , x j ) in the second sum in (1), and involved in solving classifica-
tion problem, is also positive definite. In regression tasks K is a symmetric positive
semidefinite (meaning still convex) matrix, which after a mild regularization given as
(K ← K + λI, λ ~ 1e-12) becomes positive definite one. (Note that the proof in the
case of regression does not need regularization at all, but there is no space here to go
into these details). Hence, the learning without constraints (16) converges, starting
from any initial point α 0, and each point in an n-dimensional search space for multi-
pliers αi is a viable starting point ensuring a convergence of the algorithm to the
maximum of a dual Lagrangian Ld. This, naturally, includes all the (starting) points
within, or on a boundary of, any convex subspace of a search space ensuring the con-
vergence of the algorithm to the maximum of a dual Lagrangian Ld over the given
subspace. The constraints imposed by (16) preventing variables αi to be negative or
bigger than C, and implemented by the clipping operators above, define such a con-
vex subspace. Thus, each 'clipped' multiplier value αi defines a new starting point of
the algorithm guaranteeing the convergence to the maximum of Ld over the subspace
defined by (16). For a convex constraining subspace such a constrained maximum is
unique. Q.E.D.
Due to the lack of the space we do not go into the discussion on the convergence rate
here and we leave it to some other occasion. It should be only mentioned that both
KA and SMO (i.e. GS and SOR) for positive definite kernels have been successfully
applied for many problems (see references given here, as well as many other, bench-
marking the mentioned methods on various data sets). Finally, let us just mention that
the standard extension of the GS method is the method of successive over-relaxation
that can reduce the number of iterations required by proper choice of relaxation pa-
rameter ω significantly. The SOR method uses the following updating rule

1  i −1 n  1 ∂Ld (19)
α k +1 = α k − ω  ∑ K ijα j + ∑ K ijα j − fi  = α i + ω
k +1 k k

K ii  j =1 K ii ∂α i
i i
j =i  k +1

and similarly to the KA, SMO, and GS its convergence is guaranteed.

4. Conclusions
Both the KA and the SMO algorithms were recently developed and introduced as
alternatives to solving quadratic programming problem while training support vector
machines on huge data sets. It was shown that when using positive definite kernels the
two algorithms are identical in their analytic form and numerical implementation. In
addition, for positive definite kernels both algorithms are strictly identical with a clas-
sic iterative GS (optimal coordinate ascent) learning and its extension SOR. Till now,
these facts were blurred mainly due to different pace in posing the learning problems
and due to the 'heavy' heuristics involved in an SMO implementation that shadowed
an insight into the possible identity of the methods. It is shown that in the so-called
no-bias SVMs, both the KA and the SMO procedure are the coordinate ascent based
methods. Finally, due to the many ways how all the three algorithms (KA, SMO and
GS i.e., SOR) can be implemented there may be some differences in their overall be-
haviour. The introduction of the relaxation parameter 0 < ω < 2 will speed up the al-
gorithm. The exact optimal value ωopt is problem dependent.
Acknowledgment: The results presented are initiated during the stay of the first author at the
Prof. Rolf Isermann's Institute and sponsored by the Deutsche Forschungsgemeinschaft (DFG).
He is thankful to both Prof. Rolf Isermann and DFG for all the support during this stay.

5. References
1. Anlauf, J. K., Biehl, M., The AdaTron - an adaptive perceptron algorithm. Europhys-
ics Letters, 10(7), pp. 687–692, 1989
2. Cherkassky, V., Mulier, F., Learning From Data: Concepts, Theory and Methods,
John Wiley & Sons, New York, NY, 1998
3. Cristianini, N., Shawe-Taylor, J., An introduction to Support Vector Machines and
other kernel-based learning methods, Cambridge University Press, Cambridge, UK,
2000
4. Evgeniou, T., Pontil, M., Poggio, T., Regularization networks and support vector ma-
chines, Advances in Computational Mathematics, 13, pp.1-50, 2000.
5. Frieß, T.-T., Cristianini, N., Campbell, I. C. G., The Kernel-Adatron: a Fast and Sim-
ple Learning Procedure for Support Vector Machines. In Shavlik, J., editor, Proceed-
ings of the 15th International Conference on Machine Learning, Morgan Kaufmann,
pp. 188–196, San Francisco, CA, 1998
6. Kecman V., Learning and Soft Computing, Support Vector Machines, Neural Net-
works, and Fuzzy Logic Models, The MIT Press, Cambridge, MA,
(http://www.support-vector.ws), 2001
7. Lawson, C. I., Hanson, R. J., Solving Least Squares Problems, Prentice-Hall, Engle-
wood Cliffs, N.J., 1974
8. Ostrowski, A.M., Solutions of Equations and Systems of Equations, 2nd ed., Aca-
demic Press, New York, 1966
9. Platt, J. C., Sequential minimal optimization: A fast algorithm for training support
vector machines. TR MSR-TR-98-14, Microsoft Research, 1998
10. Platt, J.C., Fast Training of Support Vector Machines using Sequential Minimal Op-
timization. Ch. 12 in Advances in Kernel Methods – Support Vector Learning, edited
by B. Schölkopf, C. Burges, A. Smola, The MIT Press, Cambridge, MA, 1999
11. Schölkopf B., Smola, A., Learning with Kernels – Support Vector Machines, Optimi-
zation, and Beyond, The MIT Press, Cambridge, MA, 2002
12. Veropoulos, K., Machine Learning Approaches to Medical Decision Making, PhD
Thesis, The University of Bristol, Bristol, UK, 2001
13. Vapnik, V.N., The Nature of Statistical Learning Theory, Springer Verlag Inc, New
York, NY, 1995
14. Vogt, M., SMO Algorithms for Support Vector Machines without Bias, Institute Re-
port, Institute of Automatic Control, TU Darmstadt, Darmstadt, Germany,
(http://w3.rt.e-technik.tu-darmstadt.de/~vogt/), 2002

View publication stats

2nd Periodical Test 2019 2020 GRADE 8
88% (57)
2nd Periodical Test 2019 2020 GRADE 8
5 pages
SVM
No ratings yet
SVM
21 pages
Bray and Travasarou - 2007
No ratings yet
Bray and Travasarou - 2007
12 pages
Upda Qatar Mechanical
100% (1)
Upda Qatar Mechanical
3 pages
ETAP IPP Load Flow Example
80% (5)
ETAP IPP Load Flow Example
122 pages
Tc-85dac Informacion Tecnica
No ratings yet
Tc-85dac Informacion Tecnica
4 pages
Tynchyshyn SVM
No ratings yet
Tynchyshyn SVM
4 pages
A Study On Sigmoid Kernels For SVM and The Training of non-PSD Kernels by SMO-type Methods
No ratings yet
A Study On Sigmoid Kernels For SVM and The Training of non-PSD Kernels by SMO-type Methods
32 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
SVM-ML-AI Lecturenotes cs725
No ratings yet
SVM-ML-AI Lecturenotes cs725
9 pages
Study On Fault Diagnosis of Rolling Bearing Based On K-L Transformation and Lagrange Support Vector Regression
No ratings yet
Study On Fault Diagnosis of Rolling Bearing Based On K-L Transformation and Lagrange Support Vector Regression
4 pages
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
100% (1)
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
896 pages
Taz TFG 2016 2057
No ratings yet
Taz TFG 2016 2057
52 pages
Lecture 4
No ratings yet
Lecture 4
49 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
January 19, 2010 16:20 WSPC/244-AADA 00035: A Sparse Greedy Self-Adaptive Algorithm For Classification of Data
No ratings yet
January 19, 2010 16:20 WSPC/244-AADA 00035: A Sparse Greedy Self-Adaptive Algorithm For Classification of Data
18 pages
Kernel Canonical Correlation Analysis: Max Welling
No ratings yet
Kernel Canonical Correlation Analysis: Max Welling
3 pages
Fast Kernel Classifiers
No ratings yet
Fast Kernel Classifiers
41 pages
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
No ratings yet
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
15 pages
SD-M1 TSI Chapitre 4
No ratings yet
SD-M1 TSI Chapitre 4
42 pages
Kernels Regularization and Differential Equations
No ratings yet
Kernels Regularization and Differential Equations
16 pages
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
No ratings yet
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
25 pages
Gaussian Kernel Optimization For Pattern Classification: Jie Wang, Haiping Lu, Juwei Lu
No ratings yet
Gaussian Kernel Optimization For Pattern Classification: Jie Wang, Haiping Lu, Juwei Lu
28 pages
Kernal Methods Machine Learning
No ratings yet
Kernal Methods Machine Learning
53 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
53 pages
hw3 2
No ratings yet
hw3 2
2 pages
An Improved Training Algorithm For Support Vector Machines
No ratings yet
An Improved Training Algorithm For Support Vector Machines
10 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
4.4-InstanceBasedLearning Part 2
No ratings yet
4.4-InstanceBasedLearning Part 2
16 pages
Advantages:: Q.No 1.a Ans
No ratings yet
Advantages:: Q.No 1.a Ans
12 pages
Scholkopf B Smola AJ Learning With Kernels Support Vector Machines Regularization Optimization and Beyond PDF
100% (2)
Scholkopf B Smola AJ Learning With Kernels Support Vector Machines Regularization Optimization and Beyond PDF
646 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
SSVM A Simple SVM Algorithm
No ratings yet
SSVM A Simple SVM Algorithm
6 pages
Sequential Minimal Optimization Method To Solve The Support Vector Machine Problem
No ratings yet
Sequential Minimal Optimization Method To Solve The Support Vector Machine Problem
5 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
2023 Machine Learning
No ratings yet
2023 Machine Learning
8 pages
Lec3-The Kernel Trick
No ratings yet
Lec3-The Kernel Trick
4 pages
Efficient SVM Regression Training With SMO: NEC Research Institute, 4 Independence Way, Princeton, NJ 08540, USA
No ratings yet
Efficient SVM Regression Training With SMO: NEC Research Institute, 4 Independence Way, Princeton, NJ 08540, USA
20 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Machine Learning Revision Notes
No ratings yet
Machine Learning Revision Notes
6 pages
ML End Sem Nov2022 Paper
No ratings yet
ML End Sem Nov2022 Paper
4 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
Training Data Selection For Support Vector Machine
No ratings yet
Training Data Selection For Support Vector Machine
11 pages
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
No ratings yet
Final Exam, 10701 Machine Learning, Spring 2009: Max. Score Score 1 2 3 4 5 6 7 8 9 10
25 pages
(Book) Fundas of OptTheory Applications To ML
No ratings yet
(Book) Fundas of OptTheory Applications To ML
832 pages
JYOTSANA SRIVASTAVA - ML - Exam - Paper
No ratings yet
JYOTSANA SRIVASTAVA - ML - Exam - Paper
2 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
Support Vector Network
No ratings yet
Support Vector Network
25 pages
POSTER Juan Francisco Agreda Vega TITLE Sobre El Aprendizaje Semi Supervisado Copy Copy Copy Copy
No ratings yet
POSTER Juan Francisco Agreda Vega TITLE Sobre El Aprendizaje Semi Supervisado Copy Copy Copy Copy
1 page
UBICC Article 522 522
No ratings yet
UBICC Article 522 522
8 pages
ML Lecture06 2
No ratings yet
ML Lecture06 2
63 pages
SVM Incremental Learning, Adaptation and Optimization - IJCNN 2003 Presentation
No ratings yet
SVM Incremental Learning, Adaptation and Optimization - IJCNN 2003 Presentation
11 pages
Simon Foucart - Mathematical Pictures at A Data Science Exhibition (2022, Cambridge University Press) - Libgen - Li
No ratings yet
Simon Foucart - Mathematical Pictures at A Data Science Exhibition (2022, Cambridge University Press) - Libgen - Li
339 pages
Support Vector Machines Jie Tang
No ratings yet
Support Vector Machines Jie Tang
28 pages
Supervised Learning Algorithms Cheat Sheet
No ratings yet
Supervised Learning Algorithms Cheat Sheet
20 pages
Unit 2
No ratings yet
Unit 2
16 pages
Basic Iterative Methods For Solving Linear Systems PDF
No ratings yet
Basic Iterative Methods For Solving Linear Systems PDF
33 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
Module-4 - AI & ML - PCCEC403
No ratings yet
Module-4 - AI & ML - PCCEC403
8 pages
An Information Theoretic Approach of Designing Sparse Kernel Adaptive Filters
No ratings yet
An Information Theoretic Approach of Designing Sparse Kernel Adaptive Filters
28 pages
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Welcome To Our Presentation: Design and Modeling of Manually Hydraulic Hand Lift Jib Crane Machine
No ratings yet
Welcome To Our Presentation: Design and Modeling of Manually Hydraulic Hand Lift Jib Crane Machine
44 pages
Metal-Matrix Composites: October 2015
No ratings yet
Metal-Matrix Composites: October 2015
29 pages
MultiSIM-9 Tutorial
100% (12)
MultiSIM-9 Tutorial
24 pages
CM MWT: Cunaetmd
No ratings yet
CM MWT: Cunaetmd
9 pages
Classification of Steels-1
100% (1)
Classification of Steels-1
85 pages
2022 AIMO Paper and Solutions
100% (1)
2022 AIMO Paper and Solutions
16 pages
Mass Standards Handbook - Physical Characteristics: Ansi/Astm E 617
No ratings yet
Mass Standards Handbook - Physical Characteristics: Ansi/Astm E 617
4 pages
Grundfos
No ratings yet
Grundfos
2 pages
LPIC 2 Linux Professional Institute Certification Study Guide Exam 201 and Exam 202 Second Edition All Chapters Instant Download
100% (5)
LPIC 2 Linux Professional Institute Certification Study Guide Exam 201 and Exam 202 Second Edition All Chapters Instant Download
24 pages
Kerpen Catalogo Medium-Voltage
No ratings yet
Kerpen Catalogo Medium-Voltage
43 pages
CSEC Maths 2020 January Past Paper Solutions
No ratings yet
CSEC Maths 2020 January Past Paper Solutions
34 pages
Steinmetz CP On The Law of Hysteresis Part 1
100% (2)
Steinmetz CP On The Law of Hysteresis Part 1
65 pages
Physics Unit 2 May 2021
No ratings yet
Physics Unit 2 May 2021
28 pages
10 Rehearsal 2024 (1) - 240120 - 055206
No ratings yet
10 Rehearsal 2024 (1) - 240120 - 055206
7 pages
ELE1501 - STUDYGUIDE Online Material 2018 v2 PDF
No ratings yet
ELE1501 - STUDYGUIDE Online Material 2018 v2 PDF
286 pages
Sheil PHD Thesis
No ratings yet
Sheil PHD Thesis
280 pages
Schneider
No ratings yet
Schneider
6 pages
Gujarat Technological University: Instructions
No ratings yet
Gujarat Technological University: Instructions
1 page
4 Optimum Channel Section
No ratings yet
4 Optimum Channel Section
9 pages
Toroidal Test Procedure
No ratings yet
Toroidal Test Procedure
12 pages
Spread Footing Flexibility
No ratings yet
Spread Footing Flexibility
11 pages
PART 152 Metal Transfer Mode For MIG & MAG, Advantages & Disadvantages of MIG MAG Welding
No ratings yet
PART 152 Metal Transfer Mode For MIG & MAG, Advantages & Disadvantages of MIG MAG Welding
2 pages
A First Course in Group Theory
100% (1)
A First Course in Group Theory
2 pages
D Eurocode 6 Masonry Vertical Resistance
No ratings yet
D Eurocode 6 Masonry Vertical Resistance
8 pages
Magnetic Chuck All
No ratings yet
Magnetic Chuck All
32 pages

On The Equality of Kernel AdaTron and Sequential M

Uploaded by

On The Equality of Kernel AdaTron and Sequential M

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

On the Equality of Kernel AdaTron and Sequential Minimal Optimization in

Conference Paper · January 2003

The user has requested enhancement of the downloaded file.

On the Equality of Kernel AdaTron and Sequen-

Vojislav Kecman1, Michael Vogt2, Te Ming Huang1

2. The KA and SMO learning algorithms without-bias-term

a) Kernel AdaTron in classification

b) SMO without-bias-term in classification

a) Kernel AdaTron in regression

3. The Coordinate Ascent Based Learning for Nonlinear

so on. The iterative learning takes the following form,

and similarly to the KA, SMO, and GS its convergence is guaranteed.

View publication stats

You might also like