0% found this document useful (0 votes)

11 views7 pages

A A Machine Learning

The document discusses Archetypal Analysis (AA), a method for estimating the principal convex hull of a dataset, which combines the interpretability of clustering with the flexibility of matrix factorization. It highlights the unique properties of AA, its efficient initialization through the FURTHEST SUM method, and its application in various machine learning domains such as computer vision and neuroimaging. The authors demonstrate that AA provides easily interpretable features that effectively capture the dynamics of the data.

Uploaded by

norig2206

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

A A Machine Learning

Uploaded by

norig2206

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Downloaded from orbit.dtu.

dk on: Jan 31, 2025

Archetypal Analysis for Machine Learning

Mørup, Morten; Hansen, Lars Kai

Published in:
IEEE International Workshop on Machine Learning for Signal Processing

Link to article, DOI:

10.1109/MLSP.2010.5589222

Publication date:
2010

Document Version
Early version, also known as pre-print

Link back to DTU Orbit

Citation (APA):
Mørup, M., & Hansen, L. K. (2010). Archetypal Analysis for Machine Learning. In IEEE International Workshop
on Machine Learning for Signal Processing IEEE. https://doi.org/10.1109/MLSP.2010.5589222

General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright
owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

 Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
 You may not further distribute the material or use it for any profit-making activity or commercial gain
 You may freely distribute the URL identifying the publication in the public portal

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
ARCHETYPAL ANALYSIS FOR MACHINE LEARNING

Morten Mørup and Lars Kai Hansen

Cognitive Systems Group, Technical University of Denmark

Richard Petersens Plads, bld 321, 2800 Lyngby, Denmark, e-mail:{mm,lkh}@imm.dtu.dk

ABSTRACT typically with a Gaussian noise model. SVD/PCA requires

Archetypal analysis (AA) proposed by Cutler and Breiman A and S be orthogonal, in ICA statistical independence is
in [1] estimates the principal convex hull of a data set. As assumed for S and in SC a penalty term is introduced that
such AA favors features that constitute representative ’cor- measures deviation from sparsity on S, while in NMF all
ners’ of the data, i.e. distinct aspects or archetypes. We will variables are constrained non-negative. In hard clustering
show that AA enjoys the interpretability of clustering - with- by K-means S is constrained to be a binary assignment ma-
out being limited to hard assignment and the uniqueness of trix such that A = XS > (SS > )−1 represents the Euclidean
SVD - without being limited to orthogonal representations. centers of each cluster while for K-medoids ak = xn for
In order to do large scale AA, we derive an efficient algo- some n, i.e. the cluster centers have to constitute actual
rithm based on projected gradient as well as an initializa- data points.
tion procedure inspired by the F URTHEST F IRST approach Despite the similarities of the above approaches their
widely used for K-means [2]. We demonstrate that the AA internal representations of the data differ greatly and thus
model is relevant for feature extraction and dimensional re- the nature of the interpretations they offer. In SVD/PCA the
duction for a large variety of machine learning problems features constitute the directions of maximal variation, i.e.
taken from computer vision, neuroimaging, text mining and so-called eigenmaps, for NMF the features are constituent
collaborative filtering. parts, for SC the features are also atoms or dictionary ele-
ments while K-means and K-medoids find the most repre-
1. INTRODUCTION sentative prototype objects.
A benefit of clustering approaches is that features are
Decomposition approaches have become a key tool for a similar to measured data making the results easier to in-
wide variety of massive data analysis from modeling of In- terpret, however, the binary assignments reduce flexibility.
ternet data such as term-document matrices of word oc- Also, clustering typically involves complex combinatorial
currences, bio-informatics data such as micro-array data of optimization leading to a plethora of heuristics. On the other
gene expressions, neuroimaging data such as neural activ- hand low rank approximations based on SVD/PCA/NMF have
ity measured over space and time to collaborative filtering a great degree of flexibility but the features can be harder
problems such as the celebrated Netflix problem to men- to interpret both as invariance to rotation of the extracted
tion but a few. The conventional approaches range from low features can lead to lack of uniqueness, i.e., X ≈ AS =
rank approximations such as singular value decomposition AQQ−1 S = ÃS̃. In addition SVD/PCA/ICA/SC are prone
(SVD), principal component analysis (PCA) [3], independent to cancellation effects in which two components both lose
component analysis (ICA) [4], sparse coding (SC) [5] and meaning because they locally become highly correlated tak-
non-negative matrix factorization (NMF) [6] to hard assign- ing positive and negative near-cancelling values (while still
ment clustering methods such as K-means and K-medoids. being globally orthogonal).
Common to these is that they can be understood as a linear In conclusion, clustering approaches give easy interpretable
mixture or factor analysis type representation of data with features but pay a price in terms of modelling flexibility due
various constraints. Thus data xm,n , where m = 1, ..., M to the binary assignment of data objects. Approaches such
is the feature index and n = 1, .., N is the sample index, as SVD/PCA/ICA/NMF/SC have added model flexibility and
is written in terms of hidden variables sk,n and projections as such can be more efficient in capturing e.g., variance,
am,k with k = 1, ..., K, however, this efficiency can lead to complex representations
from which we learn relatively little.
X
xm,n = am,k sk,n + em,n , em,n ∼ N (0, σ 2 ), (1.1)
k Archetypal analysis (AA) proposed by [1] directly com-
We thank Gitte Moos Knudsen and Claus Svarer for providing the PET
bines the virtues of clustering and the flexibility of matrix
data set. factorization. In the original paper on AA [1] the method
was demonstrated useful in the analysis of air pollution and principal component analysis, the optimal C, S will gen-
head shape and later also for tracking spatio-temporal dy- erate the principal (i.e. dominant) convex hull for the data
namics [7]. Recently, Archetypal Analysis has found use in X, see figure 1. Archetypal analysis favors features that
benchmarking and market research identifying typically ex- constitute representative ’corners’ of the data, i.e., distinct
treme practices, rather than just good practices [8] as well as aspects or archetypes. Furthermore, the AA model can nat-
in the analysis of astronomy spectra [9] as an approach for urally be considered a model between low-rank factor type
the end-member extraction problem [10]. In this paper we approximation and clustering approaches, see table 1. As
demonstrate the following important theoretical properties for K-means and K-medoids Archetypal Analysis is invari-
of AA ant to scale and translation of the data and as noted in [1]
the AA problem is non-convex.
• The Archetypal Analysis (AA) model is unique.
• Archetypal Analysis can efficiently be initialized through
the proposed F URTHEST S UM method.
• AA can be efficiently computed using a simple pro-
jected gradient method.
Fig. 1. Illustration of the AA representation of data in 2D
We further demonstrate that AA is useful for a wide variety (left panel) and 3D (middle and right panel). Blue dots and
of important machine learning problem domains resulting in lines indicate the convex sets and hulls respectively whereas
easy interpretable features that well account for the inherent green lines and dark shaded regions indicate the extracted
dynamics in data. principal convex hulls. Notice how the four component
principal convex hull (left and middle panel) and 5 com-
ponent convex hull (right panel) account for most of the dy-
2. ARCHETYPAL ANALYSIS AND THE
namics in the data while the complete convex hull even in
PRINCIPAL CONVEX HULL
2D is given by a rather large convex set. The end points
of the principal convex hull are convex combinations of the
The convex hull also denoted the convex envelope of a data
data points. These end points constitute the distinct regions
matrix X is the minimal convex set containing X. Infor-
of the clusters rather than the central regions as would be ex-
mally it can be described as a rubber band wrapped around
tracted by K-means. As such, the 5th cluster given in cyan
the data points, see also figure 1. While the problem of find-
color (right panel) is modeled as a combination of existing
ing the convex hull is solvable in linear time (i.e., O(N ))
aspects rather than given a feature of its own.
[11] the size of the convex set increases dramatically with
the dimensionality of the data. The expected size of the con-
vex set for N points in general position in K dimensional
2.1. Uniqueness of AA
space grows exponentially with dimension as O(logK−1 (N ))
[12]. As a result, in high dimensional spaces the ’minimal’ Lack of uniqueness in matrix decomposition is a main mo-
convex set forming the convex hull does not provide a com- tivation behind rotational criteria such as varimax in factor
pact data representation, see also figure 1. Archetypal anal- analysis as well as imposing statistical independence in ICA
ysis [1] considers the principal convex hull, i.e., the (K-1)- [4] and sparsity [5]. We presently prove that AA is in general
dimensional convex hull that the best account for the data unique up to permutation of the components.
according to some measure of distortion D(·|·). This can
Theorem 1. Assume ∀k∃n : cn,k > 0 ∧ cn,k0 = 0, k 0 6= k
formally be stated as the optimal C and S to the problem
then the AA model does not suffer from rotational ambiguity,
i.e. if X ≈ XCS = XCQ−1 QS = X C̃ S̃ such that
arg minC,S D(X|XCS)
both C,S and C, e Se are equivalent solutions to AA then Q
s.t. |ck |1 = 1, |sn |1 = 1,
is a permutation matrix.
C ≥ 0, S ≥ 0.
Proof. Since S̃ = QS and S ≥ 0 and S e ≥ 0 both are
The constraint |ck |1 = 1 together with C ≥ 0 enforces the solutions to AA we have |sn |1 = 1 and |Qsn |1 P = 1 ∀n. For
feature matrix A = XC as given in equation (1.1) to be this to be true Q has to be a Markov matrix, i.e. k0 qk,k0 =
a weighted average (i.e., convex combination) of the data 1, Q ≥ 0. Since C ≥ 0 and CQ−1 ≥ 0 and given ∀k∃n :
observations while the constraint |sn |1 = 1, S ≥ 0 re- cn,k > 0 ∧ cn,k0 = 0, k 0 6= k then Q−1 has to be non-
quires the nth data point to be approximated by a weighted negative. Since both Q and Q−1 are non-negative it follows
average (i.e., convex combination) of the feature vectors that Q can only be a scale and permutation matrix, and as
XC ∈ RM ×K . We will presently for brevity consider the row sum of Q has to be one it follows that Q can only
D(X|XCS) = kX − XCSk2F . As such in line with be a permutation matrix.
SVD / PCA NMF AA / PCH K -means K -medoids
C∈R XC ≥ 0 |cd |1 = 1, C ≥ 0 |cd |1 = 1, C ≥ 0 |cd |1 = 1, C ∈ B
S∈R S≥0 |sn |1 = 1, S ≥ 0 |sn |1 = 1, S ∈ B |sn |1 = 1, S ∈ B

Table 1. Relation between the AA / PCH model and unsupervised methods such as SVD / PCA, NMF, K-means and K-medoids.

The requirement ∀k∃n : cn,k > 0 ∧ cn,k0 = 0, k 0 6= k

states that for each column of C there has to exist a row
where that column element is the only non-zero element.
This holds for the AA model as two distinct aspects in gen-
eral position will not be a convex combination of the same
data points. We note that although AA is unique in general
there is no guarantee the optimal solution will be identified
due to the occurrence of local minima. Fig. 2. Illustration of the extracted prototypes by F UR -
THEST F IRST [2] (green circles) and the proposed F UR -
2.2. Efficient initialization of AA by F URTHEST S UM THEST S UM initialization procedure (red circles) for 4 (left),
25 (middle) and 100 (right) propotypes. Clearly, the F UR -
Cutler and Breiman point out in [1] that careful initializa- THEST S UM extract points belonging to the convex set of the
tion improves the speed of convergence and lowers the risk unselected data (useful for AA) whereas F URTHEST F IRST
of finding insignificant archetypes. For K-means a popu- distribute the prototypes evenly over the data region (useful
lar initialization procedure is based on the F URTHEST F IRST for K-means).
method described in [2]. The method proceeds by randomly
selecting a data point as centroid for a cluster and selecting
subsequent data points the furthest away from already se- Where the first inequality follows from the triangular in-
lected points. As such, a new data point j new is selected equality. Hence a better solution is given by a point different
according to from t contradicting that t was the optimal point.

j new = arg max{min kxi − xj k, j ∈ C}, (2.1) A comparison between the F URTHEST F IRST and F UR -
i j
THEST S UM initialization procedures can be found in fig-
where k · k is a given norm and C index current selected ure 2 based on the 2-norm, i.e. kxi − xj k2 . The primary
data points. For initializing AA we propose the following computational cost of the F URTHEST S UM procedure for the
modification forming our F URTHEST S UM procedure identification of T candidate points is the evaluation of the
X distance between all data points and the selected candidate
j new = arg max{ kxi − xj k, j ∈ C}. (2.2) points which has an overall computational complexity of
i
j O(M N T ).
To improve the identified set, C, the first point selected by
random is removed and an additional point selected in its 2.3. Projected Gradient for AA
place. For the proposed F URTHEST S UM method we have
the following important property We currently propose a simple projected gradient procedure
that can be adapted to any proximity measure. In the present
Theorem 2. The points generated by the F URTHEST S UM analysis we will however without loss of generality con-
algorithm is guaranteed to lie in the minimal convex set of sider proximity measured by least squares. In the paper of
the unselected data points. [1] the model was estimated by non-negative least squares
such that the linear constraints were enforced by introduc-
Proof. We will proof the theorem by contradiction. Assume ing quadratic penalty terms in the alternating updates of S
that there is a point t not in the minimal convex set, i.e. and C, i.e minimizingP
xt = Xc suchPthat |c|1 = 1, cd ≥ 0, ct = 0 while kX −XCSk2F +M d (|cd |1 − 1)2 +M j (|sj |1 − 1)2 ,
P
t = arg maxi j kxi − xj k, j ∈ C. We then have where M is some large number. Alternatively, standard
X X X non-negative quadratic programming solvers with linear con-
kxt − xj k = kXc − xj k < cd kxd − xj k straints can be invoked [13] for each alternating subprob-
j∈C d j∈C lem solving S for fixed C and vice versa. We found how-
X
≤ max kxd − xj k ever, that the following projected gradient method worked
d efficiently in practice. We recast the AA-problem in the
j∈C
Fig. 4. Analysis of an Altanserin tracer Positron Emission Tomography data set. Top panel: The extracted temporal profiles
for a three component SVD / PCA, NMF, AA and K-means models. Of the four models only the AA model has correctly
extracted the region specific temporal profiles that from the spatial maps in the bottom panel correspond well to non-binding,
high-binding and vascular regions respectively. As there is no scaling ambiguity the estimated arterial tissue curves are in the
correct units.

∂e
s 0 δ 0 s 0
itly satisfied. Noticing that ∂skk,n
,n
= P k s,kk,n − (P ksk,n ,n
)2
k k
and differentiating by parts, we find the following updates
for the AA parameters recast in the above normalization in-
variant variables
X e
S
sk,n ← max{e sk,n + µSe (gk,n − gkS0 ,n sek0 ,n ), 0},
e

k0
sk,n S >
e X >X C
e X X −C > >
sek,n =P , G =C eS
e
e
k sk,n
X
C
cn,k ← max{e
cn,k + µCe (gn,k − gnC0 ,k e
cn0 ,k ), 0},
e e

n0
cn,k >
e>
cn,k =P , G C
= X >X S
e − X >X C
eSeS
e
e
n cn,k

Each alternating update is performed on all elements simul-

taneously and µ is a step-size parameter that we tuned by
e > X > X and C
line-search. In the update of S, C e >X >X C e
can be pre-computed having a cost of O(KM N ) while the
Fig. 3. Properties of features extracted based on SVD / PCA, computation of the gradient as well as the evaluation of least
NMF , AA and K -means. Whereas SVD / PCA extract low to squares objective given by
high spatial frequency components and NMF decompose the
data into an atomic mixture of constituting parts the AA e > X > X, Si e > X > X C, e> i
const. − 2hC e + hC e SeS
model extracts notably more distinct aspects than K-means.
e>
has computational cost O(K 2 N ). In the update of C, X > X S
and SeSe> can be precomputed having a computational cost
s
l1 -normalization invariant variables sek,n = P k,n and of O(KM N ) and O(K 2 N ) respectively while calculating
k sk,n
c
cn,k = P n,k
e cn,k such that the equality constraints are explic- the gradient as well as the evaluation of the least squares
n
objective given by profiles as well as how each voxel is a convex mixture of
these regions. This holds provided a convex combination
const. − 2hX > X S e> , Ci
e + hC e > X > X C, e SeSe> i of the observations are able to generate the pure tissue pro-
files, i.e. can extract the distinct profiles. From figure 4
also has a computational complexity of O(KM N ). When the AA model have indeed extracted three components that
only considering T candidate points used to define the archetypes well correspond to non-binding, high-binding and vascular
as proposed in [13] this latter complexity can be further re- regions respectively. No such profiles are clearly extracted
duced to O(KM T ). In [13] it was suggested to identify by SVD / PCA, NMF or K-means that all extract components
these candidates point by outlying data points found through that are mixtures of the three different region types.
projections of the eigenvectors of the covariance matrix of Text mining: Latent Semantic Analysis has become an im-
X. We found that the proposed F URTHEST S UM algorithm portant tool for extracting word and document associations
form an efficient alternative approach to the identification in text corpora. In figure 5 we demonstrate the utility of the
of these T candidate points. In our implementation of the AA -model for this task by analyzing the NIPS bag of words
projected gradient method we carried out 10 line-search up- corpus consisting of 1,500 documents and 12,419 words
dates for each alternating update of S and C. with approximately 6,400,000 word occurrences (see also
http://archive.ics.uci.edu/ml/datasets/Bag+
2.4. kernel-AA of+Words). Each word was normalized by the inverse
document frequency (IDF). Clearly, the 10 component model
From the above updates it can be seen that the estimation has well extracted distinct categories of the NIPS papers.
of the parameters only depend on the pairwise relations, i.e. For the AA model C indicate which documents constitute
>
the inner products Kernel K = X X. As such, the AA the extracted distinct aspects XC given in the figure while
model trivially generalize to kernel representations based on S (not shown) gives the fraction by which each document
pairwise relations between the data points (kernel-AA) and resemble these aspects. Indeed the extracted archetypes cor-
can here be interpreted as extracting the principal convex respond well to distinct types of papers in the NIPS corpora.
hull in a potentially infinite Hilbert space (i.e., similar to the
corresponding interpretation of kernel-K-means and kernel-
PCA ). These types of analysis are however out of the scope
of the current paper.

3. RESULTS

We demonstrate the utility of the AA model on four data sets

taken from a variety of important machine learning problem
domains.
Computer Vision: To illustrate the properties of the AA
model we compared the extracted model representation to
the representations obtained by SVD / PCA, NMF and K-means
on the CBCL face data base of 361 pixels × 2429 images Fig. 5. Analysis of the NIPS bag of words corpus. The 10
used in [6]. The results of the analysis is given in figure most prominent words for each feature component of NMF,
3. SVD / PCA extracts features that have low to high spatial AA and K -means respectively of the extracted 10 compo-
frequency while NMF gives a part based representation as nent models. The average cosine angle (averaged over all
reported in [6]. K-means extracts cluster centers of anony- the 10(10 − 1)/2 feature comparisons) are 84.16◦ , 80.31◦
mous ’typical’ faces while features extracted by AA repre- and 73.33◦ respectively, thus, the AA model has extracted
sents more distinct (archetypal) face prototypes exposing more distinct aspects than K-means. Clearly each of the
variability and face diversity. Thus AA accounts for more components correspond well to aspects of the documents in
variation than K-means but less than SVD / PCA and NMF as the NIPS corpus. The components are ordered according to
indicated in Table 1, while the distinct facial aspects are ef- importance and in gray scale are given the relative strength
ficiently extracted. of each of the 10 top words for each of the extracted term
NeuroImaging: We analyzed a Positron Emission Tomog- groups.
raphy data set containing 40 time points × 157244 voxels
based on [18 F ]-Altanserin as radioligand in order to mea- Collaborative filtering: Collaborative filtering has be-
sure serotonin-2A neuroreceptors. Each recorded voxel is a come an important machine learning problem of which the
mixture of vascular regions, non-binding regions and high- celebrated Netflix prize is widely known. Given the prefer-
binding regions. The AA model should ideally extract these ences of users the aim is to infer preferences of products the
user has not yet rated based on the ratings given by other distinct or discriminative aspects yet has additional mod-
users. We analyzed the medium size and large size Movie eling flexibility by using soft assignment. We saw exam-
lens data given by 1,000,209 ratings of 3,952 movies by ples of improved interpretability in the AA representations
6,040 users and 10,000,054 ratings of 10,677 movies given over existing matrix factorization and clustering methods.
by 71,567 users with ratings from {1, 2, 3, 4, 5} (http: An open problem is to determine the number of compo-
nents used. This problem is no different from the problem
//www.grouplens.org/). The AA model extracts ide-
of choosing the number of components in approaches such
alized users (extreme user behaviors) while at the same time as SVD / PCA/NMF/SC/K-means, thus, methods based on ap-
relating the users to these archetypal preference types. Movies proximating the model evidence or generalization error can
that are left unrated by the users we treated as missing val- be invoked. AA is a promising unsupervised learning tool
ues based on the following extension of the AA objective to for many machine learning problems and as the representa-
accommodate missing information, i.e. tion is unique in general we believe the method holds par-
ticularly great promise for data mining applications.
X X P 0 xn,m0 cm0 ,k
min (xn,m − Pm sk,m )2 ,
S,C m 0 qn,m0 cm0 ,k 5. REFERENCES
n,m:qn,m =1 k

[1] Adele Cutler and Leo Breiman, “Archetypal analysis,” Tech-

where Q is an indicator matrix such that qn,m = 1 if the nth
nometrics, vol. 36, no. 4, pp. 338–347, Nov 1994.
movies was rated by the mth user and zero otherwise. Since
the SVD and NMF model turned out to be more prone to local [2] D. S. Hochbaum and D. B. Shmoys., “A best possible heuris-
tic for the k-center problem.,” Mathematics of Operational
minima than the AA model these methods were initialized
Research, vol. 10, no. 2, pp. 180–184, 1985.
by the AA solution obtained. From figure 6 it can be seen
[3] Gene H. Golub and Charles F. Van Loan, Matrix Compu-
that despite that the AA model is more restricted than NMF
tation, Johns Hopkins Studies in Mathematical Sciences, 3
and SVD it has a very similar test error performance and
edition, 1996.
extracts features that are much more useful for predicting
[4] Pierre Comon, “Independent component analysis, a new con-
the ratings than K-means.
cept?,” Signal Processing, vol. 36, pp. 287–314, 1994.
[5] B. A. Olshausen and D. J. Field, “Emergence of simple-cell
receptive field properties by learning a sparse code for natural
images,” Nature, vol. 381, pp. 607–609, 1996.
[6] D.D. Lee and H.S. Seung, “Learning the parts of objects
by non-negative matrix factorization,” Nature, vol. 401, no.
6755, pp. 788–91, 1999.
[7] Emily Stone and Adele Cutler, “Introduction to archetypal
analysis of spatio-temporal dynamics,” Phys. D, vol. 96, no.
1-4, pp. 110–131, 1996.
Fig. 6. Left and middle panel: Root mean square error
(rmse) for the various models as a function of the number [8] Giovanni C. Porzio, Giancarlo Ragozini, and Domenico Vis-
tocco, “On the use of archetypes as benchmarks,” Appl.
of components when analyzing the medium size Movielens
Stoch. Model. Bus. Ind., vol. 24, no. 5, pp. 419–437, 2008.
(left) and large sized (middle) Movielens data. Training er-
ror is given by solid lines and test error performance by dot- [9] B. H. P. Chan, D. A. Mitchell, and L. E. Cram, “Archetypal
analysis of galaxy spectra,” MON.NOT.ROY.ASTRON.SOC.,
ted lines based on removing 10% of the data for testing.
vol. 338, pp. 790, 2003.
Clearly, the AA model has added flexibility over K-means in
[10] P. Perez R. Plaza J. Plaza, A. Martinez, “A quantitative and
accounting for the data. Right panel: Illustration of the ex-
comparative analysis of endmember extraction algorithms
tracted archetypal users-types, XC, and their distinct pref-
from hyperspectral data,” IEEE Transactions on Geoscience
erences for the 25 component AA models with lowest vali- and Remote Sensing, vol. 42, no. 3, pp. 650–663, 2004.
dation error based on the large sized problem.
[11] D. McCallum and D. Avis, “A linear algorithm for finding the
convex hull of a simple polygon,” Information Processing
Letters, vol. 9, pp. 201–206, 1979.
4. DISCUSSION [12] Rex A. Dwyer, “On the convex hull of random points in a
polytope,” Journal of Applied Probability, vol. 25, no. 4, pp.
We demonstrated how the Archetypal Analysis model of 688–699, 1988.
[1] is useful for a large variety of machine learning prob-
[13] Christian Bauckhage and Christian Thurau, “Making
lems. A simple algorithm for fitting the AA model was de-
archetypal analysis practical,” in Proceedings of the 31st
rived as well as the F URTHEST S UM initialization procedure
DAGM Symposium on Pattern Recognition, Berlin, Heidel-
to extract end-members for initial archetypes. The utility
berg, 2009, pp. 272–281, Springer-Verlag.
of AA over clustering methods is that it focuses more on

Fast and Robust Archetypal Analysis For Representation Learning
No ratings yet
Fast and Robust Archetypal Analysis For Representation Learning
8 pages
DSBA Master Codebook - Unsupervised Learning
No ratings yet
DSBA Master Codebook - Unsupervised Learning
7 pages
Non-Negative Matrix Factorization
No ratings yet
Non-Negative Matrix Factorization
18 pages
Aiml Model
No ratings yet
Aiml Model
13 pages
Week 2 Discussion ITS 632 UC
No ratings yet
Week 2 Discussion ITS 632 UC
5 pages
Cutler and Breiman (1993)
No ratings yet
Cutler and Breiman (1993)
37 pages
ML Chapter 4
No ratings yet
ML Chapter 4
38 pages
Day School 03
No ratings yet
Day School 03
32 pages
Chpater 2 PDF
No ratings yet
Chpater 2 PDF
44 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in PDF
14 pages
Image Classification Using Image
No ratings yet
Image Classification Using Image
50 pages
Comparative Analysis On Dimension Reduction Algorithm of Principal Component Analysis and Singular Value Decomposition For Clustering
No ratings yet
Comparative Analysis On Dimension Reduction Algorithm of Principal Component Analysis and Singular Value Decomposition For Clustering
8 pages
Predicting and Evaluating The Popularity of Online News: He Ren Quan Yang
No ratings yet
Predicting and Evaluating The Popularity of Online News: He Ren Quan Yang
5 pages
Data Mining and Analysis
No ratings yet
Data Mining and Analysis
25 pages
V DM Clustering
No ratings yet
V DM Clustering
76 pages
A. Emilie J. Wedenborg and Morten Mørup: The Technical University of Denmark DTU Compute - Cognitive Systems Section
No ratings yet
A. Emilie J. Wedenborg and Morten Mørup: The Technical University of Denmark DTU Compute - Cognitive Systems Section
5 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
Data Clustering in K-Means Hierarchical Clustering DBSCAN Clustering
No ratings yet
Data Clustering in K-Means Hierarchical Clustering DBSCAN Clustering
14 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Statistical Pattern Recognition Toolbox For Matlab: User's Guide
No ratings yet
Statistical Pattern Recognition Toolbox For Matlab: User's Guide
99 pages
EDAB Module 5 Singular Value Decomposition (SVD)
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
58 pages
Image Classification Techniques-A Survey
No ratings yet
Image Classification Techniques-A Survey
4 pages
Feature Engineering
No ratings yet
Feature Engineering
51 pages
Review of Data Analysis Algorithm and Its Applications
No ratings yet
Review of Data Analysis Algorithm and Its Applications
6 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
30 pages
ML Mod 4
No ratings yet
ML Mod 4
13 pages
2009 A Survey of Evolutionary Algorithms For Clustering
No ratings yet
2009 A Survey of Evolutionary Algorithms For Clustering
23 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
DWM Sem V Module 2 - Introduction To Data Mining, Data Exploration and Data Pre-Processing
No ratings yet
DWM Sem V Module 2 - Introduction To Data Mining, Data Exploration and Data Pre-Processing
55 pages
2023 - CSUR - (AIACTR) - Experimental Comparisons of Clustering Approaches For Data Representation
No ratings yet
2023 - CSUR - (AIACTR) - Experimental Comparisons of Clustering Approaches For Data Representation
34 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Automatic Clustering Algorithms A Systematic Revie
No ratings yet
Automatic Clustering Algorithms A Systematic Revie
61 pages
Principal Component Analysis (PCA) and Linear Discriminant Analysis For Image Recognition
No ratings yet
Principal Component Analysis (PCA) and Linear Discriminant Analysis For Image Recognition
17 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
Automatics Profiles in Engineer - Article
No ratings yet
Automatics Profiles in Engineer - Article
17 pages
Attribute Focusing
No ratings yet
Attribute Focusing
9 pages
Clustering Algorithm With Learnable Distance For Categorical Data With Nominal and Ordinal Attributes
No ratings yet
Clustering Algorithm With Learnable Distance For Categorical Data With Nominal and Ordinal Attributes
5 pages
Dimensionality Reduction: Pca, SVD, MDS, Ica, and Friends
No ratings yet
Dimensionality Reduction: Pca, SVD, MDS, Ica, and Friends
50 pages
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
20 pages
Unit Iii
No ratings yet
Unit Iii
70 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
IEEE Dimensionality Reduction
No ratings yet
IEEE Dimensionality Reduction
6 pages
Learning Predictive Clustering Rules
No ratings yet
Learning Predictive Clustering Rules
12 pages
Comprehensive Review On Clustering Techniques and Its Application On High Dimensional Data
No ratings yet
Comprehensive Review On Clustering Techniques and Its Application On High Dimensional Data
8 pages
Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities
No ratings yet
Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities
27 pages
Data Analytics - Unit-IV
No ratings yet
Data Analytics - Unit-IV
21 pages
Rangkuman Data Analitik Dan Big Data
No ratings yet
Rangkuman Data Analitik Dan Big Data
10 pages
Comparartive
No ratings yet
Comparartive
7 pages
Pattern Recognition Theodoridis S. and Koutroumbas K. 2006 Book Reviews
No ratings yet
Pattern Recognition Theodoridis S. and Koutroumbas K. 2006 Book Reviews
1 page
Dimensionality Reduction Algorithms
No ratings yet
Dimensionality Reduction Algorithms
7 pages
From Explanations To Feature Selection: Assessing SHAP Values As Feature Selection Mechanism
No ratings yet
From Explanations To Feature Selection: Assessing SHAP Values As Feature Selection Mechanism
8 pages
Visualization of High Dimensional Scientific Data
No ratings yet
Visualization of High Dimensional Scientific Data
105 pages
Ventocilla, E., & Riveiro, M. (2020) - A Comparative User Study of Visualization Techniques For Cluster Analysis.
No ratings yet
Ventocilla, E., & Riveiro, M. (2020) - A Comparative User Study of Visualization Techniques For Cluster Analysis.
21 pages
Machine Learning Section3 Ebook v05
No ratings yet
Machine Learning Section3 Ebook v05
15 pages
Module-2 Part-1 - Merged
No ratings yet
Module-2 Part-1 - Merged
66 pages
Online News Articles Popularity Prediction System
No ratings yet
Online News Articles Popularity Prediction System
9 pages
CANTISANI 2021 Archivage
No ratings yet
CANTISANI 2021 Archivage
144 pages
05 - Beza Mamo - ICIoTCT2020 - Review - Accepted
No ratings yet
05 - Beza Mamo - ICIoTCT2020 - Review - Accepted
19 pages
Beyond News Contents: The Role of Social Context For Fake News Detection
No ratings yet
Beyond News Contents: The Role of Social Context For Fake News Detection
9 pages
A Case Study in A Recommender System Based On
No ratings yet
A Case Study in A Recommender System Based On
9 pages
Sentiment Analysis For Social Media Images
No ratings yet
Sentiment Analysis For Social Media Images
8 pages
AI Tools
No ratings yet
AI Tools
16 pages
Master Thesis Opinion Mining
100% (3)
Master Thesis Opinion Mining
4 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
Statistics - Python PDF
100% (1)
Statistics - Python PDF
16 pages
Topic: Non-Negative Matrix Factorisation: Assignment - 2
No ratings yet
Topic: Non-Negative Matrix Factorisation: Assignment - 2
6 pages
Proverbs Translation
No ratings yet
Proverbs Translation
10 pages
Figure 3-10: Mglearn Discrete - Scatter X - Train - Pca X - Train - Pca y - Train PLT Xlabel PLT Ylabel
No ratings yet
Figure 3-10: Mglearn Discrete - Scatter X - Train - Pca X - Train - Pca y - Train PLT Xlabel PLT Ylabel
2 pages
ISMIR 2019 Tutorial - Waveform-Based Music Processing With Deep Learning
No ratings yet
ISMIR 2019 Tutorial - Waveform-Based Music Processing With Deep Learning
152 pages
NonNegative Matrix Factorizations For Multiplex Network Analysis
No ratings yet
NonNegative Matrix Factorizations For Multiplex Network Analysis
13 pages
Topic Modelling Using Non-Negative Matrix Factorization: Anjusha C MA18M008
No ratings yet
Topic Modelling Using Non-Negative Matrix Factorization: Anjusha C MA18M008
21 pages
Text Mining Applications and Theory
100% (4)
Text Mining Applications and Theory
223 pages
Spam Detection Paper PDF
No ratings yet
Spam Detection Paper PDF
6 pages
Non-Negative Matrix Factorization
No ratings yet
Non-Negative Matrix Factorization
21 pages
Frontiers of Digital Transformation: Applications of The Real-World Data Circulation Paradigm Kazuya Takeda
No ratings yet
Frontiers of Digital Transformation: Applications of The Real-World Data Circulation Paradigm Kazuya Takeda
63 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
49 pages
Abbdf Zhang
No ratings yet
Abbdf Zhang
10 pages
Survey of Deep Learning Paradigms For Speech Processing
No ratings yet
Survey of Deep Learning Paradigms For Speech Processing
37 pages
(Assignment 1 & 2) Regular Expression
No ratings yet
(Assignment 1 & 2) Regular Expression
3 pages
Artificial Intelligence Advances - Vol.3, Iss.1 April 2021
No ratings yet
Artificial Intelligence Advances - Vol.3, Iss.1 April 2021
75 pages
S1-2 - I - Data Driven - Personas For Enhanced User Understanding
No ratings yet
S1-2 - I - Data Driven - Personas For Enhanced User Understanding
17 pages
Low-Rank and Sparse Representation For Hyperspectral Image Processing A Review
No ratings yet
Low-Rank and Sparse Representation For Hyperspectral Image Processing A Review
34 pages
45 45 PB
No ratings yet
45 45 PB
68 pages
Personalized E-Learning Recommender System Based On Autoencoders
No ratings yet
Personalized E-Learning Recommender System Based On Autoencoders
20 pages
Shivam Resume v1
No ratings yet
Shivam Resume v1
1 page

A A Machine Learning

Uploaded by

A A Machine Learning

Uploaded by

Downloaded from orbit.dtu.

dk on: Jan 31, 2025

Archetypal Analysis for Machine Learning

Mørup, Morten; Hansen, Lars Kai

Link to article, DOI:

Link back to DTU Orbit

Morten Mørup and Lars Kai Hansen

Cognitive Systems Group, Technical University of Denmark

ABSTRACT typically with a Gaussian noise model. SVD/PCA requires

The requirement ∀k∃n : cn,k > 0 ∧ cn,k0 = 0, k 0 6= k

Each alternating update is performed on all elements simul-

We demonstrate the utility of the AA model on four data sets

[1] Adele Cutler and Leo Breiman, “Archetypal analysis,” Tech-

You might also like