A A Machine Learning
A A Machine Learning
Published in:
IEEE International Workshop on Machine Learning for Signal Processing
Publication date:
2010
Document Version
Early version, also known as pre-print
Citation (APA):
Mørup, M., & Hansen, L. K. (2010). Archetypal Analysis for Machine Learning. In IEEE International Workshop
on Machine Learning for Signal Processing IEEE. https://doi.org/10.1109/MLSP.2010.5589222
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright
owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
You may not further distribute the material or use it for any profit-making activity or commercial gain
You may freely distribute the URL identifying the publication in the public portal
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
ARCHETYPAL ANALYSIS FOR MACHINE LEARNING
Table 1. Relation between the AA / PCH model and unsupervised methods such as SVD / PCA, NMF, K-means and K-medoids.
j new = arg max{min kxi − xj k, j ∈ C}, (2.1) A comparison between the F URTHEST F IRST and F UR -
i j
THEST S UM initialization procedures can be found in fig-
where k · k is a given norm and C index current selected ure 2 based on the 2-norm, i.e. kxi − xj k2 . The primary
data points. For initializing AA we propose the following computational cost of the F URTHEST S UM procedure for the
modification forming our F URTHEST S UM procedure identification of T candidate points is the evaluation of the
X distance between all data points and the selected candidate
j new = arg max{ kxi − xj k, j ∈ C}. (2.2) points which has an overall computational complexity of
i
j O(M N T ).
To improve the identified set, C, the first point selected by
random is removed and an additional point selected in its 2.3. Projected Gradient for AA
place. For the proposed F URTHEST S UM method we have
the following important property We currently propose a simple projected gradient procedure
that can be adapted to any proximity measure. In the present
Theorem 2. The points generated by the F URTHEST S UM analysis we will however without loss of generality con-
algorithm is guaranteed to lie in the minimal convex set of sider proximity measured by least squares. In the paper of
the unselected data points. [1] the model was estimated by non-negative least squares
such that the linear constraints were enforced by introduc-
Proof. We will proof the theorem by contradiction. Assume ing quadratic penalty terms in the alternating updates of S
that there is a point t not in the minimal convex set, i.e. and C, i.e minimizingP
xt = Xc suchPthat |c|1 = 1, cd ≥ 0, ct = 0 while kX −XCSk2F +M d (|cd |1 − 1)2 +M j (|sj |1 − 1)2 ,
P
t = arg maxi j kxi − xj k, j ∈ C. We then have where M is some large number. Alternatively, standard
X X X non-negative quadratic programming solvers with linear con-
kxt − xj k = kXc − xj k < cd kxd − xj k straints can be invoked [13] for each alternating subprob-
j∈C d j∈C lem solving S for fixed C and vice versa. We found how-
X
≤ max kxd − xj k ever, that the following projected gradient method worked
d efficiently in practice. We recast the AA-problem in the
j∈C
Fig. 4. Analysis of an Altanserin tracer Positron Emission Tomography data set. Top panel: The extracted temporal profiles
for a three component SVD / PCA, NMF, AA and K-means models. Of the four models only the AA model has correctly
extracted the region specific temporal profiles that from the spatial maps in the bottom panel correspond well to non-binding,
high-binding and vascular regions respectively. As there is no scaling ambiguity the estimated arterial tissue curves are in the
correct units.
∂e
s 0 δ 0 s 0
itly satisfied. Noticing that ∂skk,n
,n
= P k s,kk,n − (P ksk,n ,n
)2
k k
and differentiating by parts, we find the following updates
for the AA parameters recast in the above normalization in-
variant variables
X e
S
sk,n ← max{e sk,n + µSe (gk,n − gkS0 ,n sek0 ,n ), 0},
e
k0
sk,n S >
e X >X C
e X X −C > >
sek,n =P , G =C eS
e
e
k sk,n
X
C
cn,k ← max{e
cn,k + µCe (gn,k − gnC0 ,k e
cn0 ,k ), 0},
e e
n0
cn,k >
e>
cn,k =P , G C
= X >X S
e − X >X C
eSeS
e
e
n cn,k
3. RESULTS