0% found this document useful (0 votes)
11 views7 pages

A A Machine Learning

The document discusses Archetypal Analysis (AA), a method for estimating the principal convex hull of a dataset, which combines the interpretability of clustering with the flexibility of matrix factorization. It highlights the unique properties of AA, its efficient initialization through the FURTHEST SUM method, and its application in various machine learning domains such as computer vision and neuroimaging. The authors demonstrate that AA provides easily interpretable features that effectively capture the dynamics of the data.

Uploaded by

norig2206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

A A Machine Learning

The document discusses Archetypal Analysis (AA), a method for estimating the principal convex hull of a dataset, which combines the interpretability of clustering with the flexibility of matrix factorization. It highlights the unique properties of AA, its efficient initialization through the FURTHEST SUM method, and its application in various machine learning domains such as computer vision and neuroimaging. The authors demonstrate that AA provides easily interpretable features that effectively capture the dynamics of the data.

Uploaded by

norig2206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Downloaded from orbit.dtu.

dk on: Jan 31, 2025

Archetypal Analysis for Machine Learning

Mørup, Morten; Hansen, Lars Kai

Published in:
IEEE International Workshop on Machine Learning for Signal Processing

Link to article, DOI:


10.1109/MLSP.2010.5589222

Publication date:
2010

Document Version
Early version, also known as pre-print

Link back to DTU Orbit

Citation (APA):
Mørup, M., & Hansen, L. K. (2010). Archetypal Analysis for Machine Learning. In IEEE International Workshop
on Machine Learning for Signal Processing IEEE. https://doi.org/10.1109/MLSP.2010.5589222

General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright
owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

 Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
 You may not further distribute the material or use it for any profit-making activity or commercial gain
 You may freely distribute the URL identifying the publication in the public portal

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
ARCHETYPAL ANALYSIS FOR MACHINE LEARNING

Morten Mørup and Lars Kai Hansen

Cognitive Systems Group, Technical University of Denmark


Richard Petersens Plads, bld 321, 2800 Lyngby, Denmark, e-mail:{mm,lkh}@imm.dtu.dk

ABSTRACT typically with a Gaussian noise model. SVD/PCA requires


Archetypal analysis (AA) proposed by Cutler and Breiman A and S be orthogonal, in ICA statistical independence is
in [1] estimates the principal convex hull of a data set. As assumed for S and in SC a penalty term is introduced that
such AA favors features that constitute representative ’cor- measures deviation from sparsity on S, while in NMF all
ners’ of the data, i.e. distinct aspects or archetypes. We will variables are constrained non-negative. In hard clustering
show that AA enjoys the interpretability of clustering - with- by K-means S is constrained to be a binary assignment ma-
out being limited to hard assignment and the uniqueness of trix such that A = XS > (SS > )−1 represents the Euclidean
SVD - without being limited to orthogonal representations. centers of each cluster while for K-medoids ak = xn for
In order to do large scale AA, we derive an efficient algo- some n, i.e. the cluster centers have to constitute actual
rithm based on projected gradient as well as an initializa- data points.
tion procedure inspired by the F URTHEST F IRST approach Despite the similarities of the above approaches their
widely used for K-means [2]. We demonstrate that the AA internal representations of the data differ greatly and thus
model is relevant for feature extraction and dimensional re- the nature of the interpretations they offer. In SVD/PCA the
duction for a large variety of machine learning problems features constitute the directions of maximal variation, i.e.
taken from computer vision, neuroimaging, text mining and so-called eigenmaps, for NMF the features are constituent
collaborative filtering. parts, for SC the features are also atoms or dictionary ele-
ments while K-means and K-medoids find the most repre-
1. INTRODUCTION sentative prototype objects.
A benefit of clustering approaches is that features are
Decomposition approaches have become a key tool for a similar to measured data making the results easier to in-
wide variety of massive data analysis from modeling of In- terpret, however, the binary assignments reduce flexibility.
ternet data such as term-document matrices of word oc- Also, clustering typically involves complex combinatorial
currences, bio-informatics data such as micro-array data of optimization leading to a plethora of heuristics. On the other
gene expressions, neuroimaging data such as neural activ- hand low rank approximations based on SVD/PCA/NMF have
ity measured over space and time to collaborative filtering a great degree of flexibility but the features can be harder
problems such as the celebrated Netflix problem to men- to interpret both as invariance to rotation of the extracted
tion but a few. The conventional approaches range from low features can lead to lack of uniqueness, i.e., X ≈ AS =
rank approximations such as singular value decomposition AQQ−1 S = ÃS̃. In addition SVD/PCA/ICA/SC are prone
(SVD), principal component analysis (PCA) [3], independent to cancellation effects in which two components both lose
component analysis (ICA) [4], sparse coding (SC) [5] and meaning because they locally become highly correlated tak-
non-negative matrix factorization (NMF) [6] to hard assign- ing positive and negative near-cancelling values (while still
ment clustering methods such as K-means and K-medoids. being globally orthogonal).
Common to these is that they can be understood as a linear In conclusion, clustering approaches give easy interpretable
mixture or factor analysis type representation of data with features but pay a price in terms of modelling flexibility due
various constraints. Thus data xm,n , where m = 1, ..., M to the binary assignment of data objects. Approaches such
is the feature index and n = 1, .., N is the sample index, as SVD/PCA/ICA/NMF/SC have added model flexibility and
is written in terms of hidden variables sk,n and projections as such can be more efficient in capturing e.g., variance,
am,k with k = 1, ..., K, however, this efficiency can lead to complex representations
from which we learn relatively little.
X
xm,n = am,k sk,n + em,n , em,n ∼ N (0, σ 2 ), (1.1)
k Archetypal analysis (AA) proposed by [1] directly com-
We thank Gitte Moos Knudsen and Claus Svarer for providing the PET
bines the virtues of clustering and the flexibility of matrix
data set. factorization. In the original paper on AA [1] the method
was demonstrated useful in the analysis of air pollution and principal component analysis, the optimal C, S will gen-
head shape and later also for tracking spatio-temporal dy- erate the principal (i.e. dominant) convex hull for the data
namics [7]. Recently, Archetypal Analysis has found use in X, see figure 1. Archetypal analysis favors features that
benchmarking and market research identifying typically ex- constitute representative ’corners’ of the data, i.e., distinct
treme practices, rather than just good practices [8] as well as aspects or archetypes. Furthermore, the AA model can nat-
in the analysis of astronomy spectra [9] as an approach for urally be considered a model between low-rank factor type
the end-member extraction problem [10]. In this paper we approximation and clustering approaches, see table 1. As
demonstrate the following important theoretical properties for K-means and K-medoids Archetypal Analysis is invari-
of AA ant to scale and translation of the data and as noted in [1]
the AA problem is non-convex.
• The Archetypal Analysis (AA) model is unique.
• Archetypal Analysis can efficiently be initialized through
the proposed F URTHEST S UM method.
• AA can be efficiently computed using a simple pro-
jected gradient method.
Fig. 1. Illustration of the AA representation of data in 2D
We further demonstrate that AA is useful for a wide variety (left panel) and 3D (middle and right panel). Blue dots and
of important machine learning problem domains resulting in lines indicate the convex sets and hulls respectively whereas
easy interpretable features that well account for the inherent green lines and dark shaded regions indicate the extracted
dynamics in data. principal convex hulls. Notice how the four component
principal convex hull (left and middle panel) and 5 com-
ponent convex hull (right panel) account for most of the dy-
2. ARCHETYPAL ANALYSIS AND THE
namics in the data while the complete convex hull even in
PRINCIPAL CONVEX HULL
2D is given by a rather large convex set. The end points
of the principal convex hull are convex combinations of the
The convex hull also denoted the convex envelope of a data
data points. These end points constitute the distinct regions
matrix X is the minimal convex set containing X. Infor-
of the clusters rather than the central regions as would be ex-
mally it can be described as a rubber band wrapped around
tracted by K-means. As such, the 5th cluster given in cyan
the data points, see also figure 1. While the problem of find-
color (right panel) is modeled as a combination of existing
ing the convex hull is solvable in linear time (i.e., O(N ))
aspects rather than given a feature of its own.
[11] the size of the convex set increases dramatically with
the dimensionality of the data. The expected size of the con-
vex set for N points in general position in K dimensional
2.1. Uniqueness of AA
space grows exponentially with dimension as O(logK−1 (N ))
[12]. As a result, in high dimensional spaces the ’minimal’ Lack of uniqueness in matrix decomposition is a main mo-
convex set forming the convex hull does not provide a com- tivation behind rotational criteria such as varimax in factor
pact data representation, see also figure 1. Archetypal anal- analysis as well as imposing statistical independence in ICA
ysis [1] considers the principal convex hull, i.e., the (K-1)- [4] and sparsity [5]. We presently prove that AA is in general
dimensional convex hull that the best account for the data unique up to permutation of the components.
according to some measure of distortion D(·|·). This can
Theorem 1. Assume ∀k∃n : cn,k > 0 ∧ cn,k0 = 0, k 0 6= k
formally be stated as the optimal C and S to the problem
then the AA model does not suffer from rotational ambiguity,
i.e. if X ≈ XCS = XCQ−1 QS = X C̃ S̃ such that
arg minC,S D(X|XCS)
both C,S and C, e Se are equivalent solutions to AA then Q
s.t. |ck |1 = 1, |sn |1 = 1,
is a permutation matrix.
C ≥ 0, S ≥ 0.
Proof. Since S̃ = QS and S ≥ 0 and S e ≥ 0 both are
The constraint |ck |1 = 1 together with C ≥ 0 enforces the solutions to AA we have |sn |1 = 1 and |Qsn |1 P = 1 ∀n. For
feature matrix A = XC as given in equation (1.1) to be this to be true Q has to be a Markov matrix, i.e. k0 qk,k0 =
a weighted average (i.e., convex combination) of the data 1, Q ≥ 0. Since C ≥ 0 and CQ−1 ≥ 0 and given ∀k∃n :
observations while the constraint |sn |1 = 1, S ≥ 0 re- cn,k > 0 ∧ cn,k0 = 0, k 0 6= k then Q−1 has to be non-
quires the nth data point to be approximated by a weighted negative. Since both Q and Q−1 are non-negative it follows
average (i.e., convex combination) of the feature vectors that Q can only be a scale and permutation matrix, and as
XC ∈ RM ×K . We will presently for brevity consider the row sum of Q has to be one it follows that Q can only
D(X|XCS) = kX − XCSk2F . As such in line with be a permutation matrix.
SVD / PCA NMF AA / PCH K -means K -medoids
C∈R XC ≥ 0 |cd |1 = 1, C ≥ 0 |cd |1 = 1, C ≥ 0 |cd |1 = 1, C ∈ B
S∈R S≥0 |sn |1 = 1, S ≥ 0 |sn |1 = 1, S ∈ B |sn |1 = 1, S ∈ B

Table 1. Relation between the AA / PCH model and unsupervised methods such as SVD / PCA, NMF, K-means and K-medoids.

The requirement ∀k∃n : cn,k > 0 ∧ cn,k0 = 0, k 0 6= k


states that for each column of C there has to exist a row
where that column element is the only non-zero element.
This holds for the AA model as two distinct aspects in gen-
eral position will not be a convex combination of the same
data points. We note that although AA is unique in general
there is no guarantee the optimal solution will be identified
due to the occurrence of local minima. Fig. 2. Illustration of the extracted prototypes by F UR -
THEST F IRST [2] (green circles) and the proposed F UR -
2.2. Efficient initialization of AA by F URTHEST S UM THEST S UM initialization procedure (red circles) for 4 (left),
25 (middle) and 100 (right) propotypes. Clearly, the F UR -
Cutler and Breiman point out in [1] that careful initializa- THEST S UM extract points belonging to the convex set of the
tion improves the speed of convergence and lowers the risk unselected data (useful for AA) whereas F URTHEST F IRST
of finding insignificant archetypes. For K-means a popu- distribute the prototypes evenly over the data region (useful
lar initialization procedure is based on the F URTHEST F IRST for K-means).
method described in [2]. The method proceeds by randomly
selecting a data point as centroid for a cluster and selecting
subsequent data points the furthest away from already se- Where the first inequality follows from the triangular in-
lected points. As such, a new data point j new is selected equality. Hence a better solution is given by a point different
according to from t contradicting that t was the optimal point.

j new = arg max{min kxi − xj k, j ∈ C}, (2.1) A comparison between the F URTHEST F IRST and F UR -
i j
THEST S UM initialization procedures can be found in fig-
where k · k is a given norm and C index current selected ure 2 based on the 2-norm, i.e. kxi − xj k2 . The primary
data points. For initializing AA we propose the following computational cost of the F URTHEST S UM procedure for the
modification forming our F URTHEST S UM procedure identification of T candidate points is the evaluation of the
X distance between all data points and the selected candidate
j new = arg max{ kxi − xj k, j ∈ C}. (2.2) points which has an overall computational complexity of
i
j O(M N T ).
To improve the identified set, C, the first point selected by
random is removed and an additional point selected in its 2.3. Projected Gradient for AA
place. For the proposed F URTHEST S UM method we have
the following important property We currently propose a simple projected gradient procedure
that can be adapted to any proximity measure. In the present
Theorem 2. The points generated by the F URTHEST S UM analysis we will however without loss of generality con-
algorithm is guaranteed to lie in the minimal convex set of sider proximity measured by least squares. In the paper of
the unselected data points. [1] the model was estimated by non-negative least squares
such that the linear constraints were enforced by introduc-
Proof. We will proof the theorem by contradiction. Assume ing quadratic penalty terms in the alternating updates of S
that there is a point t not in the minimal convex set, i.e. and C, i.e minimizingP
xt = Xc suchPthat |c|1 = 1, cd ≥ 0, ct = 0 while kX −XCSk2F +M d (|cd |1 − 1)2 +M j (|sj |1 − 1)2 ,
P
t = arg maxi j kxi − xj k, j ∈ C. We then have where M is some large number. Alternatively, standard
X X X non-negative quadratic programming solvers with linear con-
kxt − xj k = kXc − xj k < cd kxd − xj k straints can be invoked [13] for each alternating subprob-
j∈C d j∈C lem solving S for fixed C and vice versa. We found how-
X
≤ max kxd − xj k ever, that the following projected gradient method worked
d efficiently in practice. We recast the AA-problem in the
j∈C
Fig. 4. Analysis of an Altanserin tracer Positron Emission Tomography data set. Top panel: The extracted temporal profiles
for a three component SVD / PCA, NMF, AA and K-means models. Of the four models only the AA model has correctly
extracted the region specific temporal profiles that from the spatial maps in the bottom panel correspond well to non-binding,
high-binding and vascular regions respectively. As there is no scaling ambiguity the estimated arterial tissue curves are in the
correct units.

∂e
s 0 δ 0 s 0
itly satisfied. Noticing that ∂skk,n
,n
= P k s,kk,n − (P ksk,n ,n
)2
k k
and differentiating by parts, we find the following updates
for the AA parameters recast in the above normalization in-
variant variables
X e
S
sk,n ← max{e sk,n + µSe (gk,n − gkS0 ,n sek0 ,n ), 0},
e

k0
sk,n S >
e X >X C
e X X −C > >
sek,n =P , G =C eS
e
e
k sk,n
X
C
cn,k ← max{e
cn,k + µCe (gn,k − gnC0 ,k e
cn0 ,k ), 0},
e e

n0
cn,k >
e>
cn,k =P , G C
= X >X S
e − X >X C
eSeS
e
e
n cn,k

Each alternating update is performed on all elements simul-


taneously and µ is a step-size parameter that we tuned by
e > X > X and C
line-search. In the update of S, C e >X >X C e
can be pre-computed having a cost of O(KM N ) while the
Fig. 3. Properties of features extracted based on SVD / PCA, computation of the gradient as well as the evaluation of least
NMF , AA and K -means. Whereas SVD / PCA extract low to squares objective given by
high spatial frequency components and NMF decompose the
data into an atomic mixture of constituting parts the AA e > X > X, Si e > X > X C, e> i
const. − 2hC e + hC e SeS
model extracts notably more distinct aspects than K-means.
e>
has computational cost O(K 2 N ). In the update of C, X > X S
and SeSe> can be precomputed having a computational cost
s
l1 -normalization invariant variables sek,n = P k,n and of O(KM N ) and O(K 2 N ) respectively while calculating
k sk,n
c
cn,k = P n,k
e cn,k such that the equality constraints are explic- the gradient as well as the evaluation of the least squares
n
objective given by profiles as well as how each voxel is a convex mixture of
these regions. This holds provided a convex combination
const. − 2hX > X S e> , Ci
e + hC e > X > X C, e SeSe> i of the observations are able to generate the pure tissue pro-
files, i.e. can extract the distinct profiles. From figure 4
also has a computational complexity of O(KM N ). When the AA model have indeed extracted three components that
only considering T candidate points used to define the archetypes well correspond to non-binding, high-binding and vascular
as proposed in [13] this latter complexity can be further re- regions respectively. No such profiles are clearly extracted
duced to O(KM T ). In [13] it was suggested to identify by SVD / PCA, NMF or K-means that all extract components
these candidates point by outlying data points found through that are mixtures of the three different region types.
projections of the eigenvectors of the covariance matrix of Text mining: Latent Semantic Analysis has become an im-
X. We found that the proposed F URTHEST S UM algorithm portant tool for extracting word and document associations
form an efficient alternative approach to the identification in text corpora. In figure 5 we demonstrate the utility of the
of these T candidate points. In our implementation of the AA -model for this task by analyzing the NIPS bag of words
projected gradient method we carried out 10 line-search up- corpus consisting of 1,500 documents and 12,419 words
dates for each alternating update of S and C. with approximately 6,400,000 word occurrences (see also
http://archive.ics.uci.edu/ml/datasets/Bag+
2.4. kernel-AA of+Words). Each word was normalized by the inverse
document frequency (IDF). Clearly, the 10 component model
From the above updates it can be seen that the estimation has well extracted distinct categories of the NIPS papers.
of the parameters only depend on the pairwise relations, i.e. For the AA model C indicate which documents constitute
>
the inner products Kernel K = X X. As such, the AA the extracted distinct aspects XC given in the figure while
model trivially generalize to kernel representations based on S (not shown) gives the fraction by which each document
pairwise relations between the data points (kernel-AA) and resemble these aspects. Indeed the extracted archetypes cor-
can here be interpreted as extracting the principal convex respond well to distinct types of papers in the NIPS corpora.
hull in a potentially infinite Hilbert space (i.e., similar to the
corresponding interpretation of kernel-K-means and kernel-
PCA ). These types of analysis are however out of the scope
of the current paper.

3. RESULTS

We demonstrate the utility of the AA model on four data sets


taken from a variety of important machine learning problem
domains.
Computer Vision: To illustrate the properties of the AA
model we compared the extracted model representation to
the representations obtained by SVD / PCA, NMF and K-means
on the CBCL face data base of 361 pixels × 2429 images Fig. 5. Analysis of the NIPS bag of words corpus. The 10
used in [6]. The results of the analysis is given in figure most prominent words for each feature component of NMF,
3. SVD / PCA extracts features that have low to high spatial AA and K -means respectively of the extracted 10 compo-
frequency while NMF gives a part based representation as nent models. The average cosine angle (averaged over all
reported in [6]. K-means extracts cluster centers of anony- the 10(10 − 1)/2 feature comparisons) are 84.16◦ , 80.31◦
mous ’typical’ faces while features extracted by AA repre- and 73.33◦ respectively, thus, the AA model has extracted
sents more distinct (archetypal) face prototypes exposing more distinct aspects than K-means. Clearly each of the
variability and face diversity. Thus AA accounts for more components correspond well to aspects of the documents in
variation than K-means but less than SVD / PCA and NMF as the NIPS corpus. The components are ordered according to
indicated in Table 1, while the distinct facial aspects are ef- importance and in gray scale are given the relative strength
ficiently extracted. of each of the 10 top words for each of the extracted term
NeuroImaging: We analyzed a Positron Emission Tomog- groups.
raphy data set containing 40 time points × 157244 voxels
based on [18 F ]-Altanserin as radioligand in order to mea- Collaborative filtering: Collaborative filtering has be-
sure serotonin-2A neuroreceptors. Each recorded voxel is a come an important machine learning problem of which the
mixture of vascular regions, non-binding regions and high- celebrated Netflix prize is widely known. Given the prefer-
binding regions. The AA model should ideally extract these ences of users the aim is to infer preferences of products the
user has not yet rated based on the ratings given by other distinct or discriminative aspects yet has additional mod-
users. We analyzed the medium size and large size Movie eling flexibility by using soft assignment. We saw exam-
lens data given by 1,000,209 ratings of 3,952 movies by ples of improved interpretability in the AA representations
6,040 users and 10,000,054 ratings of 10,677 movies given over existing matrix factorization and clustering methods.
by 71,567 users with ratings from {1, 2, 3, 4, 5} (http: An open problem is to determine the number of compo-
nents used. This problem is no different from the problem
//www.grouplens.org/). The AA model extracts ide-
of choosing the number of components in approaches such
alized users (extreme user behaviors) while at the same time as SVD / PCA/NMF/SC/K-means, thus, methods based on ap-
relating the users to these archetypal preference types. Movies proximating the model evidence or generalization error can
that are left unrated by the users we treated as missing val- be invoked. AA is a promising unsupervised learning tool
ues based on the following extension of the AA objective to for many machine learning problems and as the representa-
accommodate missing information, i.e. tion is unique in general we believe the method holds par-
ticularly great promise for data mining applications.
X X P 0 xn,m0 cm0 ,k
min (xn,m − Pm sk,m )2 ,
S,C m 0 qn,m0 cm0 ,k 5. REFERENCES
n,m:qn,m =1 k

[1] Adele Cutler and Leo Breiman, “Archetypal analysis,” Tech-


where Q is an indicator matrix such that qn,m = 1 if the nth
nometrics, vol. 36, no. 4, pp. 338–347, Nov 1994.
movies was rated by the mth user and zero otherwise. Since
the SVD and NMF model turned out to be more prone to local [2] D. S. Hochbaum and D. B. Shmoys., “A best possible heuris-
tic for the k-center problem.,” Mathematics of Operational
minima than the AA model these methods were initialized
Research, vol. 10, no. 2, pp. 180–184, 1985.
by the AA solution obtained. From figure 6 it can be seen
[3] Gene H. Golub and Charles F. Van Loan, Matrix Compu-
that despite that the AA model is more restricted than NMF
tation, Johns Hopkins Studies in Mathematical Sciences, 3
and SVD it has a very similar test error performance and
edition, 1996.
extracts features that are much more useful for predicting
[4] Pierre Comon, “Independent component analysis, a new con-
the ratings than K-means.
cept?,” Signal Processing, vol. 36, pp. 287–314, 1994.
[5] B. A. Olshausen and D. J. Field, “Emergence of simple-cell
receptive field properties by learning a sparse code for natural
images,” Nature, vol. 381, pp. 607–609, 1996.
[6] D.D. Lee and H.S. Seung, “Learning the parts of objects
by non-negative matrix factorization,” Nature, vol. 401, no.
6755, pp. 788–91, 1999.
[7] Emily Stone and Adele Cutler, “Introduction to archetypal
analysis of spatio-temporal dynamics,” Phys. D, vol. 96, no.
1-4, pp. 110–131, 1996.
Fig. 6. Left and middle panel: Root mean square error
(rmse) for the various models as a function of the number [8] Giovanni C. Porzio, Giancarlo Ragozini, and Domenico Vis-
tocco, “On the use of archetypes as benchmarks,” Appl.
of components when analyzing the medium size Movielens
Stoch. Model. Bus. Ind., vol. 24, no. 5, pp. 419–437, 2008.
(left) and large sized (middle) Movielens data. Training er-
ror is given by solid lines and test error performance by dot- [9] B. H. P. Chan, D. A. Mitchell, and L. E. Cram, “Archetypal
analysis of galaxy spectra,” MON.NOT.ROY.ASTRON.SOC.,
ted lines based on removing 10% of the data for testing.
vol. 338, pp. 790, 2003.
Clearly, the AA model has added flexibility over K-means in
[10] P. Perez R. Plaza J. Plaza, A. Martinez, “A quantitative and
accounting for the data. Right panel: Illustration of the ex-
comparative analysis of endmember extraction algorithms
tracted archetypal users-types, XC, and their distinct pref-
from hyperspectral data,” IEEE Transactions on Geoscience
erences for the 25 component AA models with lowest vali- and Remote Sensing, vol. 42, no. 3, pp. 650–663, 2004.
dation error based on the large sized problem.
[11] D. McCallum and D. Avis, “A linear algorithm for finding the
convex hull of a simple polygon,” Information Processing
Letters, vol. 9, pp. 201–206, 1979.
4. DISCUSSION [12] Rex A. Dwyer, “On the convex hull of random points in a
polytope,” Journal of Applied Probability, vol. 25, no. 4, pp.
We demonstrated how the Archetypal Analysis model of 688–699, 1988.
[1] is useful for a large variety of machine learning prob-
[13] Christian Bauckhage and Christian Thurau, “Making
lems. A simple algorithm for fitting the AA model was de-
archetypal analysis practical,” in Proceedings of the 31st
rived as well as the F URTHEST S UM initialization procedure
DAGM Symposium on Pattern Recognition, Berlin, Heidel-
to extract end-members for initial archetypes. The utility
berg, 2009, pp. 272–281, Springer-Verlag.
of AA over clustering methods is that it focuses more on

You might also like