0% found this document useful (0 votes)

11 views

Clustering Methods For Spherical Data: An Overview and A New Generalization

This document provides an overview of clustering methods for spherical data. It discusses applications of spherical clustering in text mining and gene expression analysis. It also describes popular distance measures like cosine similarity and information theoretic measures used in spherical clustering. Furthermore, it outlines distance-based clustering algorithms like spherical K-means and fuzzy C-direction for spherical data.

Uploaded by

Omar Alejandro Reyna Gutiérrez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Clustering Methods For Spherical Data: An Overview and A New Generalization

Uploaded by

Omar Alejandro Reyna Gutiérrez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/324037906

Clustering Methods for Spherical Data: An Overview and a New Generalization

Chapter · January 2018

DOI: 10.1007/978-981-10-8168-2_11

CITATIONS READS
3 1,327

2 authors:

Sungsu Kim Ashis Sengupta

University of Louisiana at Lafayette Indian Statistical Institute
26 PUBLICATIONS 96 CITATIONS 97 PUBLICATIONS 1,308 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Generalized Variance - Overall Variability View project

Big Data Analytics on Circular Statistics View project

All content following this page was uploaded by Sungsu Kim on 23 December 2018.

The user has requested enhancement of the downloaded file.

Clustering Methods for Spherical Data: an
Overview and a New Generalization

Sungsu Kim and Ashis SenGupta

Abstract Recent advances in data acquisition technologies have led to massive

amount of data collected routinely in information sciences and technology, as well
as engineering sciences. In this big data era, a clustering analysis is a fundamen-
tal and crucial step in an attempt to explore structures and patterns in massive
datasets, where clustering objects (data) are represented as vectors. Often such high
dimensional vectors are L2 normalized so that they lie on the surface of unit hyper-
sphere, transforming them to spherical data. Thus clustering such data is equivalent
to grouping spherical data, where either cosine similarity or correlation is a desired
metric to identify similar observations, rather than Euclidean similarity metrics. In
this chapter, an overview of different clustering methods for spherical data in the
literature is presented. A model-based generalization for asymmetric spherical data
is also introduced.

1 Introduction

A cluster analysis refers to finding of natural groups (clusters) from a data set,
when little or nothing is known about the category structure. A cluster analysis di-
vides data into groups (clusters) that are meaningful, useful, or both. A data cluster-
ing belongs to the core methods of data mining, in which one focuses on large data
sets with unknown underlying structure. One can broadly categorize clustering ap-
proaches to be either model-based (parametric) or distance-based (non-parametric
or prototype-based).

Sungsu Kim
Department of Mathematics, University of Louisiana at Lafayette, U.S.A.
e-mail: [email protected]
Ashis SenGupta
Applied Statistics Unit, Indian Statistical Institute, Kolkata, India.
e-mail: [email protected]

1
2 Sungsu Kim and Ashis SenGupta

In distance-based methods, a cluster is an aggregation of (data) objects in a mul-

tidimensional space such that objects in a cluster are more similar to each other
than to objects in other clusters, and the choice of a distance measure between
two objects, called similarity (or dissimilarity) measure, is one of the key issues.
Most distance-based methods for linear data are based on the K-means method,
fuzzy C-means clustering algorithm, which are called flat partitioning, or hierar-
chical method (Johnson and Wichern, 2008). Flat partitioning clustering algorithms
have been recognized to be more suitable as opposed to the hierarchical clustering
schemes for processing large datasets. In model-based methods, clusters represent
different populations existing in a data set. Hence, a mixture model is a way to
perform a parametric clustering analysis. Extensive details on mixture models for
linear data are given by Everitt and Hand (1981). However, compared to linear data,
researches on both distance-based and model-based clustering methods for spheri-
cal data are emerging only recently. In the next section, we present an overview of
clustering methods for spherical data and an alternative model-based methods for
asymmetric spherical data.

2 What is Spherical Clustering?

In this big data era, a clustering analysis is a fundamental and crucial step in
an attempt to explore structures and patterns in massive datasets, where clustering
objects (data) are represented as vectors. Often such high dimensional vectors are L2
normalized so that they lie on the surface of unit hypersphere, transforming them to
spherical data. In spherical (directional) clustering (i.e. clustering of spherical data),
a set of data vectors is partitioned into groups, where the distance used to group the
vectors is the angle between them. That is, data vectors are grouped depending on
the direction into which they point, but the overall vector length does not influence
the clustering result. The goal of spherical clustering is thus to find a partition in
which clusters are made up of vectors that roughly point in the same direction. For
distance-based methods, cosine similarity, instead of Euclidean distance, is mostly
used, which measures the cosine of an angle formed by two vectors. For model-
based methods, popular mixture models such as a mixture of multi-variate Gaussian
distributions are inadequate, and the use of a spherical distribution in a mixture
model is required.

2.1 Applications of Spherical Clustering

Two main applications of spherical clustering are found in text mining and gene
expression analysis. In document clustering (or text mining), text documents are
grouped based on their features, often described in frequencies (counts) of words,
after removing stop words and word stemming operation. Using words as features,
Clustering Methods for Spherical Data: an Overview and a New Generalization 3

text documents are often represented as high-dimensional and sparse vectors - a few
thousands dimensions and a sparsity of 95 to 99% is typical. In order to remove the
biases induced by different lengths of documents, the data are normalized to have
the unit length, ignoring overall lengths of documents. In other words, documents
with a similar composition but different lengths will be grouped together (Dhillon
and Modha, 2001).
Gene expression profile data is usually represented by a matrix of expression
levels, with rows corresponding to genes and columns to conditions, experiments or
time points. Each row vector is the expression pattern of a particular gene across all
the conditions. Since the goal of gene expression clustering is to detect groups of
genes that exhibit similar expression patterns, the data are standardized so that genes
have mean zero and variance 1, removing the effect of magnitude of expression level
(Banerjee et al., 2005).
As a special case of spherical clustering, the spatial clustering is used for agri-
cultural insurance claims and earthquake occurrences (SenGupta, 2016). Other ap-
plications of spherical clustering found in the literature include
• fMRI, white matter supervoxel segmentation and brain imaging in Biomedicine
• spatial fading and blind speech segregation in Signal processing
• exoplanet data clustering in Astrophysics
• environmental pollution data in Environmental sciences.

2.2 Distance-based Methods

2.2.1 Similarity Measures in Spherical Clustering

Cosine similarity measure quantifies similarity between two spherical objects as

the cosine of the angle between vectors. Cosine similarity measure is one of the
most popular similarity measure performed in spherical clustering applications. Co-
sine similarity measure is non-negative and bounded between [0,1], and Pearson
correlation is exactly cosine similarity measure when data are standardized to have
mean zero and variance 1.
The Jaccard coefficient, which is sometimes referred to as the Tanimoto coef-
ficient, measures similarity as the intersection divided by the union of two objects
and ranges between 0 and 1. For text document, the Jaccard coefficient compares
the sum weight of shared terms to the sum weight of terms that are present in either
of the two documents but are not the shared terms.
In information theory based clustering, a vector is considered as a probability
distribution of elements, and similarity of two vectors is measured as the distance
between two corresponding probability distributions. The Kullback Leibler diver-
gence (KL divergence), also called the relative entropy, is a widely applied measure
for evaluating the difference between two probability distributions. However, unlike
4 Sungsu Kim and Ashis SenGupta

the previous similarity measures, the KL divergence is not symmetric, as a result,

the averaged KL divergence is used in the literature.

2.2.2 Spherical K-means and Fuzzy C-direction Algorithms

When data lie on the unit circle, the circular distance between two objects is
given by cos(α1 − α2 ), where α1 and α2 are corresponding angles (Jammalamadaka
and SenGupta, 2001). Generalizing the circular distance to unit hypersphere, co-
sine similarity between two unit vectors, say y1 and y2 , is defined to be the in-
ner product of y1 and y2 , denoted < y1 , y2 >. Suppose n spherical data points are
subject to a classification into K groups. Spherical K-means algorithm minimizes
∑Kk=1 ∑ni=1 µki < yi , pk >, where µki = 1 if yi belongs to cluster k (and otherwise
µki = 0), and pk denotes a prototype (cluster center) vector for cluster k. The opti-
mization process consists of alternating updates of the memberships and the cluster
centers. Given a set of data objects and a pre-specified number of clusters K, K clus-
ters are initialized. each one being the centroid of a cluster. The remaining objects
are then assigned to the cluster represented by the nearest or most similar centroid,
which is an updating membership step. Next, new centroids are re-computed for
each cluster and in turn all data objects are re-assigned based on the new centroids,
which is an updating cluster center step. These 2 steps iterate until a converged and
fixed solution is reached, where all data objects remain in the same cluster after an
update of centroids (Hornik et al., 2012).
It is known that for complex data sets containing overlapping clusters, fuzzy par-
titions model the data better than their crisp counterparts. In fuzzy C-means cluster-
ing algorithm for spherical data, each data point belongs to more than one cluster
with a different membership value (Kesemen et al., 2016). Fuzzy C-means algorithm
for spherical data uses the following criterion
K n
min ∑
B,M
∑ νkim dki2 , (1)
k=1 i=1

where M is a matrix of fuzzy memberships denoted by νki and m > 1, B is a matrix

with centroid column vectors, and dki denotes a similarity measure between object i
and centroid k.

2.2.3 Issues Related to Distance-based Methods

• The number of clusters needs to be provided.

• Different initialization of K clusters can produce difference clustering results.
• Convergence is local and the globally optimal solution can not be guaranteed.
Though, fuzzy C-means algorithm is less prone to local or sub-optimal solutions.
• Convergence is relative slow in high dimension.
Clustering Methods for Spherical Data: an Overview and a New Generalization 5

2.3 Model-based Methods: Mixture Models

Suppose a data set consists of n spherical objects (i.e. unit vectors) {y1 , y2 , . . . , yn } ∈
S p−1 that one wants to divide into K homogeneous groups. Denoting by g(·) a prob-
ability density function of Y , the mixture model is
K
g(y) = ∑ πk f (y; θk ), (2)
k=1

where πk (with ∑Kk=1 πk = 1), f and θk represent mixing proportion, spherical den-
sity function and parameter vector of kth mixture component, respectively. Infer-
ences of a mixture model cannot be directly done through the maximization of the
likelihood since group labels {z1 , z2 , . . . , zn } of n objects are unknown. The set of
pairs {(yi , zi )}ni=1 is usually referred to as the complete data set. The E-M algorithm
iteratively maximizes the conditional expectation of the complete log-likelihood,
beginning with initial values of θ (0) . Each expectation (E) step computes the ex-
pectation of the complete log-likelihood conditionally to the current value of θ (q) .
Then, the maximization (M) step maximizes the expectation of the complete log-
likelihood over θ (q) to provide an update for θ , i.e. θ (q+1) . Computations with high-
dimensional or large number of components can be quite demanding. In such cases,
Bayesian approaches can lead to significant computational savings and have been
quite popular.

2.3.1 Mixture of von-Mises Fisher Distributions

The most widely used mixture model is a mixture of von Mises Fisher(vMF)
distributions. The probability density function of vMF distribution is defined by
0
f (y|µ, κ) = cd (κ)eκ µ y , (3)

where µ is a mean vector, κ is a concentration parameter around µ, and cd de-

notes the normalizing constant. It is not possible to directly estimate κ value in
high dimensional data and an asymptotic approximation is used. vMF distribution
is unimodal and symmetric with circular contours. Various contour shapes of vMF
distribution are shown in Figure 1.

2.3.2 Score Matching Algorithm

While the E-M algorithm is most widely being used in a mixture modeling of
spherical clustering, it requires an approximation of the normalizing constant of a
spherical probability distribution, for example, κ in case of vMF distribution. Alter-
natively, score matching algorithm (Rosenbaum and Rubin, 1983) can be employed,
which does not require any knowledge about a normalizing constant. Let f (y; π)
6 Sungsu Kim and Ashis SenGupta

3.0

3.0
−0.1 −0.1
−0.2
−0.2 −0.2 −0.2
−0.3 −0.3 −0.3

2.5

2.5
−0.6
−0.4 −0.4 −0.4 −0.5

.4
−0.5

−0
−0.5
−0.7 −1
−0.6 −0.7

2.0

2.0
−0 −0
.8 .9

−1.
1.5

1.5

4
−0.7

−1
−0.8

.2
1.0

1.0

1.0
−0.6

−0
−0.6 .5
−0

.5
−0.5 −0.6 −0.8
−0.4

0.5

0.5
−0.4 −0.4
−0.3 −0.3 −0.3
−0.4
−0.2 −0.2 −0.2 −0.2
−0.1 −0.1 −0.1

0.0

0.0
0 0 0 0

3.0 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3

3.0

3.0
−0.2 −0.4
−1 −2
−0
2.5

2.5

2.5
.8
−1 −2
.2
−8

−4
2.0

2.0

0
−2

−4

−120
−5

−18
−2.6
−1.6
1.5

1.5

1.5
−6
−1.8

0
1.0

1.0

1.0
−1
−2

−3
0
−1.5 −4
−1
.6
0.5

0.5

0.5
−0

−0.5
−0.2
0.0

0.0

0.0
0 0 0 0

−3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3

Fig. 1 Contour Plots of von Mises Fisher Distribution

forms a canonical exponential family on a compact oriented Riemannian manifold

with its density proportional to exp{π 0t(y)}, where π and t(y) denote vectors of
natural parameters and sufficient statistics, respectively. Then

π̂ = Wn−1 dn , (4)

where Wn and dn are sample averages over n data points of wab = E{< ta ,tb > (y)}
and dc = −E{∆M tc (y)}, where <, > the gradient inner product and ∆M the Lapace-
Beltrami operator.

2.3.3 Connection between Spherical K-Means and Mixture of von-Mises

Fisher Distributions

Suppose the concentration parameters of all components in a mixture of von-

Mises Fisher distributions are equal and infinite, and mixing proportions (πk0 s) are
all equal as well. Under these assumptions, the E-step reduces to assigning a data
point to its nearest cluster, where nearness is computed as cosine similarity between
the point and cluster representatives. Hence spherical K-means is a special case of
the vMF mixture model.
Clustering Methods for Spherical Data: an Overview and a New Generalization 7

2.3.4 Issues Related to Model-based Method

Some of the issues related to model-based methods include

• curse of dimensionality
• over-parameterizations in high dimension
• observations are small compared to the number of variables
• goodness of fit test is not available
• sensitive to initial values for θ , which are usually given by a partitioning method
such as spherical K-means
• vMF distribution is not suitable if shapes of clusters are not circular symmetric.

3 Alternative Model-based Method: Spherical Generalization of

Asymmetric Circular Distributions

In this section, alternative spherical probability models are discussed, which are
suitable to model non-symmetric cluster shapes.
The probability density function of Kent distribution (Kent, 1982) is defined by

f (y|ζ , κ) = Cκ exp κ(ζ10 y) + β (ζ20 y)2 − (ζ30 y)2 ,

(5)

where ζ1 , ζ2 , ζ3 are mean direction, major axis and minor axis vectors, resp. κ, β are
shape parameters, and Cκ denotes the normalizing constant. The density has ellipse-
like contours of constant probability density on the spherical surface. For a high-
dimension, maximum likelihood estimation is problematic and moment estimators
are available (Peel et al., 2001).
By construction, the mixture of the Inverse Stereographic Projection of Multi-
variate Normal Distribution has the isodensity lines that are inverse stereographic
mappings of ellipsoids, which allows asymmetric contour shapes. The necessary
and sufficient condition for the density being unimodal is that the greatest eigen-
1
value of the variance-covariance matrix is smaller than 2(p−1) , where p denotes the
dimension of a multivariate normal distribution used in the projection. There is no
closed form solution for µMLE (Dortet-Bernadet and Wicker, 2008).
While mixture models using Kent distribution or inverse stereographic projection
of normal distributions are suitable for elongated clusters in the data, using their el-
liptic contours, they will not perform well with clusters having shifted centers nor
non-convex clusters. Spherical generalizations of two asymmetric circular distribu-
tions found in the following sections provide more flexible model-based spherical
clustering.
8 Sungsu Kim and Ashis SenGupta

3.1 Spherical Generalization of GvM

When data lie in the unit circle, the generalized von Mises(GvM) density is given
by
exp(κ1 cos(θ − µ1 ) + κ2 cos 2(θ − µ2 ))
f (θ ) = R π , (6)
−π exp(κ1 cos(θ − µ1 ) + κ2 cos 2(θ − µ2 ))dθ

where µ1 , µ2 ∈ (−π, π] are location parameters, and κ1 , κ2 > 0 are shape parameters.
GvM distribution is suitable for modeling asymmetric and bimodal circular data,
and an extended model of the von Mises (vM) distribution.
A spherical generalization of GvM distribution has the density given by

f (y|ζ , κ) = Cκ exp κ(ζ10 y) + β (ζ20 y)2 − (ζ30 y)2 ,

(7)

where ζ ’s are orientation vectors, κ, β are shape parameters, and Cκ denotes the
normalizing constant.
Various contour shapes shown in Figure 2 suggest that a mixture model based
on spherical generalization of GvM distribution is appropriate for non-convex sym-
metric or asymmetric cluster shapes, as well as circular or elliptic symmetric cluster
shapes. The Kent distribution is a special case of (7), where ζ1 , ζ2 and ζ3 are con-
strained to be orthogonal.
3.0

3.0

3.0
−0.3 −0.5
−0.4 −0.6
2.5

2.5

2.5
−2 −2
−0.8
−1
2.0

2.0

2.0
−8
−2

−3
−10
−3

−14
1.5

1.5

1.5
−1
−1
−0.4
.4

−0.9
−0

−1.5

−6
1.0

1.0

1.0
−0.5

−2.5
−0.5
−0.5

−0.7 −1.5
0.5

0.5

−0.4 −0.5 −1
−0.3
−0.1 −0.2
0.0

0.0

0 0 0 0

−3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3
3.0

3.0

−0.4
−0.5
−1 −1 −0.8
2.5

2.5

−0.6

−2 −2 −1
.2
−4

−2
−5−6
2.0

2.0

−2.2 −2
−1.5

−2.5
−6

−9
1.5

1.5

1.5
−5

−7

−1.8

−0.4
1.0

1.0

−3
−0

−3
−1.5
0.5

0.5

−1 −1
−0.5 −0.4
−0.2
0.0

0.0

0 0 0 0

−3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3

Fig. 2 Contour Plots of Spherical GvM Distribution

Clustering Methods for Spherical Data: an Overview and a New Generalization 9

3.2 Spherical Generalization of GvM3

The three-parameter generalized von Mises(GvM3 ) density (Kim and SenGupta,

2012) is given by

exp(κ1 cos(θ − µ) + κ2 sin 2(θ − µ))

f (θ ) = R π , (8)
−π exp(κ1 cos(θ − µ) + κ2 sin 2(θ − µ))dθ

where µ ∈ (−π, π] is a location parameter, and κ1 > 0 and κ2 ∈ [−1, 1] are concen-
tration and skewness parameters, respectively. GvM3 distribution has an advantage
over GvM distribution with one less parameter and easier interpretation of the pa-
rameters.
A spherical generalization of GvM3 has the density given by

f (y|ζ , κ) = Cκ exp κ(ζ10 y) + β (ζ20 y)(ζ30 y) ,

(9)

where ζ ’s are orientation vectors, κ, β are shape parameters, and Cκ denotes the
normalizing constant.
Various contour shapes shown in Figure 3 suggest that a mixture model based on
spherical generalization of GvM3 distribution is appropriate for clusters with shifted
centers or clusters with a daughter cluster, as well as symmetric cluster shapes.
3.0

3.0

3.0
−0.2

−5
2.5

2.5

2.5
−0.8
−0.5 −1.5

−1
−0.5 −1

−1.2

−1
2.0

2.0

2.0
−2

5
−2
−4.5

−45
−3

−50
−2.6

−5
1.5

1.5

1.5
−1.4

−2
−1.5
1.0

1.0

1.0
−1

−10
−0.6

−1
−0.5
0.5

0.5

−0.2 −0.4
0.0

0.0

0 0 0 0

−3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3
3.0

3.0

−0.2
−0.5 −0.5
−1 −0.3
−1 −0.4
2.5

2.5

−1 −0.5
−1
−1 −0
.6

−2
−−3

.5 .7
−0
4
2.0

2.0

−1

−0.9
−8

−2.5

−3
1.5

1.5

1.5
−2

−0.9
−0.6
−2.5

−0.7
−5

−2
1.0

1.0

−0
.6
−2 −0.5
0.5

0.5

−0.4
−0.3
−0.2
−0.1
0.0

0.0

0 0 0 0

−3 −1 1 3 −3 −1 1 3 −3 −1 1 3 −3 −1 1 3

Fig. 3 Contour Plots of Spherical GvM3 Distribution

10 Sungsu Kim and Ashis SenGupta

4 Concluding Remarks

In this chapter, an overview on spherical clustering was presented, and more flex-
ible alternative model-based methods were discussed. The authors suggest our read-
ers to consider the alternative model-based methods found in this chapter when clus-
ter shapes in the data set seem to arise from populations which have neither circular
nor elliptic contours. On the other hand, it is possible to consider more flexible alter-
native distance-based methods for asymmetric cluster shapes by developing suitable
similarity measures.

References

1. Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von
Mises-Fisher Distributions. Journal of Machine Learning Research, 6, 1345–1382 (2005)
2. Dhillon, I.S., Modha, D.S.: Concept decompositions for large sparse text using clustering.
Machine Learning, 42, 143–175 (2001)
3. Dortet-Bernadet, J-N., Wicker, N.: Model-based clustering on the unit sphere with an illus-
tration using gene expression profiles. Biostatistics, 9, 66–80 (2008)
4. Everitt, B.S., Hand, D.J.: Finite Mixture Distributions. Chapman and Hall, London (1981)
5. Hornik, K., Feinerer, I., Kober, M., Buchta, C.: Spherical K-means Clustering. Journal of
Statistical Software, 50, 1–22 (2012)
6. Jammalamadaka, S., SenGupta, A.: Topics in Circular Statistics. World Scientific, Singapore
(2001)
7. Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis. Pearson, New York
(2008)
8. Kent, J.T.: The Fisher-Bingham distribution on the sphere. Journal of the Royal Statistical
Society Series B, 44, 71–80 (1982)
9. Kesemen, O., Tezel, Ö., Özkul, E.: Fuzzy c-means clustering algorithm for directional data
(FCM4DD). Expert Systems with Applications, 58, 76–82 (2016)
10. Kim, S., SenGupta, A.: A three-parameter generalized von Mises distribution. Statistical Pa-
pers, 54, 685–693 (2012)
11. Peel, D., Whiten, W.J., McLachlan, G.J.: Fitting mixtures of Kent distributions to aid in joint
set identification. Journal of the American Statistical Association, 96, 56–63 (2001)
12. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational stud-
ies for causal effects. Biometrika, 70, 41-55 (1983)
13. SenGupta, A.: High volatility, multimodal distributions and directional statistics. Special In-
vited Paper, Platinum Jubilee International Conference on Applicaitons of Statistics, Calcutta
University, December 21-23, 2016

View publication stats

(Ebook) Gestalt Therapy by Gordon Wheeler, Lena Axelsson ISBN 9781433818592, 1433818590 download
100% (1)
(Ebook) Gestalt Therapy by Gordon Wheeler, Lena Axelsson ISBN 9781433818592, 1433818590 download
56 pages
Tutorial Sphere Starccm
No ratings yet
Tutorial Sphere Starccm
25 pages
An Empirical Evaluation of Density-Based Clustering Techniques
No ratings yet
An Empirical Evaluation of Density-Based Clustering Techniques
8 pages
Survey of Clustering Data Mining Techniques: Pavel Berkhin
100% (1)
Survey of Clustering Data Mining Techniques: Pavel Berkhin
56 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
A Survey of Some Density Based Clustering Techniques PDF
No ratings yet
A Survey of Some Density Based Clustering Techniques PDF
5 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Clustering
No ratings yet
Clustering
8 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
CLUSTRING
No ratings yet
CLUSTRING
13 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
Clustering Class Ppt
No ratings yet
Clustering Class Ppt
103 pages
Unit 3 Updated Notes
No ratings yet
Unit 3 Updated Notes
29 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Cluster Analysis
No ratings yet
Cluster Analysis
22 pages
E-Note_28966_Content_Document_20241211091351PM
No ratings yet
E-Note_28966_Content_Document_20241211091351PM
69 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
A06-A Survey of Clustering Techniques
No ratings yet
A06-A Survey of Clustering Techniques
5 pages
SSRN Id3768295
No ratings yet
SSRN Id3768295
7 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Clustering Agglo Devisive DBSCAN
No ratings yet
Clustering Agglo Devisive DBSCAN
78 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
A Distribution-Based Clustering Algorithm For Mining in Large Spatial Databases
No ratings yet
A Distribution-Based Clustering Algorithm For Mining in Large Spatial Databases
8 pages
Clustering
No ratings yet
Clustering
12 pages
BIS 541 Ch04 20-21 S
No ratings yet
BIS 541 Ch04 20-21 S
82 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
Custer Analysis: Prepared by Navin Ninama
No ratings yet
Custer Analysis: Prepared by Navin Ninama
20 pages
Author's Accepted Manuscript: Pattern Recognition
No ratings yet
Author's Accepted Manuscript: Pattern Recognition
41 pages
Clustering Part2
No ratings yet
Clustering Part2
29 pages
DBSCAN Past, present and future
No ratings yet
DBSCAN Past, present and future
7 pages
Clustering
No ratings yet
Clustering
65 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
Clustering new
No ratings yet
Clustering new
6 pages
Data-Clustering (Part I)
No ratings yet
Data-Clustering (Part I)
74 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
DWDM - Unit - VI
No ratings yet
DWDM - Unit - VI
38 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
Density & Grid based clustering
100% (1)
Density & Grid based clustering
21 pages
Spherical K-Means Clustering
No ratings yet
Spherical K-Means Clustering
22 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
4.unit 4 ML Q&A
No ratings yet
4.unit 4 ML Q&A
73 pages
Comparison of Different Clustering Algorithms Using WEKA Tool
No ratings yet
Comparison of Different Clustering Algorithms Using WEKA Tool
3 pages
UNIT-4
No ratings yet
UNIT-4
106 pages
Data Clustering Seminar
No ratings yet
Data Clustering Seminar
34 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
Data Clustering A Review
No ratings yet
Data Clustering A Review
60 pages
Module 5
No ratings yet
Module 5
91 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
ml8
No ratings yet
ml8
5 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
Clustering Analysis (Unsupervised)
No ratings yet
Clustering Analysis (Unsupervised)
6 pages
Extending the Boundaries: An Expansive Journey into Nonparametric Curve Estimation
From Everand
Extending the Boundaries: An Expansive Journey into Nonparametric Curve Estimation
Pasquale De Marco
No ratings yet
Substantive Theory and Constructive Measures: A Collection of Chapters and Measurement Commentary on Causal Science
From Everand
Substantive Theory and Constructive Measures: A Collection of Chapters and Measurement Commentary on Causal Science
Mark Everett Stone
No ratings yet
Union-Find Data Structures and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Union-Find Data Structures and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Glossary of Research Methodology
From Everand
Glossary of Research Methodology
Dr. Awadhesh Kishore
No ratings yet
Unit-16, Part-2 & 3
No ratings yet
Unit-16, Part-2 & 3
38 pages
TRUMPF Genuine Parts Catalog EN
No ratings yet
TRUMPF Genuine Parts Catalog EN
124 pages
Geophysics For The Mineral Exploration Geoscientist-1-6
No ratings yet
Geophysics For The Mineral Exploration Geoscientist-1-6
8 pages
American Journal of Physics Volume 53 Issue 9 1985 (Doi 10.1119/1.14356) MacKeown, P. K. - Evaluation of Feynman Path Integrals by Monte Carlo Methods
No ratings yet
American Journal of Physics Volume 53 Issue 9 1985 (Doi 10.1119/1.14356) MacKeown, P. K. - Evaluation of Feynman Path Integrals by Monte Carlo Methods
6 pages
Secret Spy Duties STEM PowerPoint
No ratings yet
Secret Spy Duties STEM PowerPoint
11 pages
Comparison_of_Calculation_Methods_of_Elastic_Bondi
No ratings yet
Comparison_of_Calculation_Methods_of_Elastic_Bondi
15 pages
Heliyon: Ibukun Olorunniwo, Sunday J. Olotu, Olatunbosun A. Alao, Adekunle A. Adepelumi
No ratings yet
Heliyon: Ibukun Olorunniwo, Sunday J. Olotu, Olatunbosun A. Alao, Adekunle A. Adepelumi
7 pages
Cohesive Zone Modelling For Fatigue Life Analysis of Adhesive Joints
No ratings yet
Cohesive Zone Modelling For Fatigue Life Analysis of Adhesive Joints
99 pages
Vii Sci l15 m03 Light
No ratings yet
Vii Sci l15 m03 Light
10 pages
PDF The Works of Archimedes Edited in Modern Notation with Introductory Chapters Thomas L. Heath download
100% (1)
PDF The Works of Archimedes Edited in Modern Notation with Introductory Chapters Thomas L. Heath download
81 pages
6-111 ProjectileMotion July2016-Capstone
No ratings yet
6-111 ProjectileMotion July2016-Capstone
2 pages
2.inverse Trigonometry Functions 2ndPUC PYQs
100% (2)
2.inverse Trigonometry Functions 2ndPUC PYQs
2 pages
3 AB Trigonal Planar Trigonal Planar 120 Between All Bonds
No ratings yet
3 AB Trigonal Planar Trigonal Planar 120 Between All Bonds
5 pages
Moog Moog X761-S40J 4VBL
No ratings yet
Moog Moog X761-S40J 4VBL
5 pages
Spooning Lab Report
No ratings yet
Spooning Lab Report
2 pages
DLL_SCIENCE-5_Q1-W2
No ratings yet
DLL_SCIENCE-5_Q1-W2
10 pages
Topics in Mathematics Algebra & Advanced Math: Oblique Triangles (Sine Law, Cosine Law, Law of
No ratings yet
Topics in Mathematics Algebra & Advanced Math: Oblique Triangles (Sine Law, Cosine Law, Law of
4 pages
Optimization of Multiple Effect Evaporation System
100% (1)
Optimization of Multiple Effect Evaporation System
6 pages
1STC Molds 2013may
No ratings yet
1STC Molds 2013may
93 pages
Boyle'S Law Experiment: Short Description
No ratings yet
Boyle'S Law Experiment: Short Description
3 pages
05 NAC Initial Longitudinal Stability (160213)
No ratings yet
05 NAC Initial Longitudinal Stability (160213)
22 pages
Soal Tbi 6
No ratings yet
Soal Tbi 6
25 pages
11th Matric Physics 1 To 6 Chapters Objectives EM Questions Only
No ratings yet
11th Matric Physics 1 To 6 Chapters Objectives EM Questions Only
10 pages
3371 Leaflet Ita Eng 191206
No ratings yet
3371 Leaflet Ita Eng 191206
4 pages
Schaum's Outline of Signals and Systems (4th Edition) Hwei P. Hsu 2024 scribd download
100% (3)
Schaum's Outline of Signals and Systems (4th Edition) Hwei P. Hsu 2024 scribd download
40 pages
The Ra Contact - Teaching The Law of One (Sessions Only)
No ratings yet
The Ra Contact - Teaching The Law of One (Sessions Only)
866 pages
The Measurement of Sensation Full Book Access
100% (15)
The Measurement of Sensation Full Book Access
14 pages
Statistical Models And Methods For Data Science 1st Leonardo Grilli instant download
100% (1)
Statistical Models And Methods For Data Science 1st Leonardo Grilli instant download
46 pages

Clustering Methods For Spherical Data: An Overview and A New Generalization

Uploaded by

Clustering Methods For Spherical Data: An Overview and A New Generalization

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Clustering Methods for Spherical Data: An Overview and a New Generalization

Chapter · January 2018

Sungsu Kim Ashis Sengupta

SEE PROFILE SEE PROFILE

Generalized Variance - Overall Variability View project

Big Data Analytics on Circular Statistics View project

The user has requested enhancement of the downloaded file.

Sungsu Kim and Ashis SenGupta

Abstract Recent advances in data acquisition technologies have led to massive

In distance-based methods, a cluster is an aggregation of (data) objects in a mul-

2 What is Spherical Clustering?

2.1 Applications of Spherical Clustering

2.2 Distance-based Methods

2.2.1 Similarity Measures in Spherical Clustering

Cosine similarity measure quantifies similarity between two spherical objects as

the previous similarity measures, the KL divergence is not symmetric, as a result,

2.2.2 Spherical K-means and Fuzzy C-direction Algorithms

where M is a matrix of fuzzy memberships denoted by νki and m > 1, B is a matrix

2.2.3 Issues Related to Distance-based Methods

• The number of clusters needs to be provided.

2.3 Model-based Methods: Mixture Models

2.3.1 Mixture of von-Mises Fisher Distributions

where µ is a mean vector, κ is a concentration parameter around µ, and cd de-

2.3.2 Score Matching Algorithm

Fig. 1 Contour Plots of von Mises Fisher Distribution

forms a canonical exponential family on a compact oriented Riemannian manifold

2.3.3 Connection between Spherical K-Means and Mixture of von-Mises

Suppose the concentration parameters of all components in a mixture of von-

2.3.4 Issues Related to Model-based Method

Some of the issues related to model-based methods include

3 Alternative Model-based Method: Spherical Generalization of

f (y|ζ , κ) = Cκ exp κ(ζ10 y) + β (ζ20 y)2 − (ζ30 y)2 ,

3.1 Spherical Generalization of GvM

f (y|ζ , κ) = Cκ exp κ(ζ10 y) + β (ζ20 y)2 − (ζ30 y)2 ,

Fig. 2 Contour Plots of Spherical GvM Distribution

3.2 Spherical Generalization of GvM3

The three-parameter generalized von Mises(GvM3 ) density (Kim and SenGupta,

exp(κ1 cos(θ − µ) + κ2 sin 2(θ − µ))

f (y|ζ , κ) = Cκ exp κ(ζ10 y) + β (ζ20 y)(ζ30 y) ,

Fig. 3 Contour Plots of Spherical GvM3 Distribution

View publication stats

You might also like