A Strongly Consistent Sparse $k$-means Clustering with Direct $l_1$ Penalization on Variable Weights

Chakraborty, Saptarshi; Das, Swagatam

Statistics > Machine Learning

arXiv:1903.10039 (stat)

[Submitted on 24 Mar 2019]

Title:A Strongly Consistent Sparse $k$-means Clustering with Direct $l_1$ Penalization on Variable Weights

Authors:Saptarshi Chakraborty, Swagatam Das

View PDF

Abstract:We propose the Lasso Weighted $k$-means ($LW$-$k$-means) algorithm as a simple yet efficient sparse clustering procedure for high-dimensional data where the number of features ($p$) can be much larger compared to the number of observations ($n$). In the $LW$-$k$-means algorithm, we introduce a lasso-based penalty term, directly on the feature weights to incorporate feature selection in the framework of sparse clustering. $LW$-$k$-means does not make any distributional assumption of the given dataset and thus, induces a non-parametric method for feature selection. We also analytically investigate the convergence of the underlying optimization procedure in $LW$-$k$-means and establish the strong consistency of our algorithm. $LW$-$k$-means is tested on several real-life and synthetic datasets and through detailed experimental analysis, we find that the performance of the method is highly competitive against some state-of-the-art procedures for clustering and feature selection, not only in terms of clustering accuracy but also with respect to computational time.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1903.10039 [stat.ML]
	(or arXiv:1903.10039v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1903.10039

Submission history

From: Swagatam Das [view email]
[v1] Sun, 24 Mar 2019 18:45:35 UTC (3,117 KB)

Statistics > Machine Learning

Title:A Strongly Consistent Sparse $k$-means Clustering with Direct $l_1$ Penalization on Variable Weights

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:A Strongly Consistent Sparse $k$-means Clustering with Direct $l_1$ Penalization on Variable Weights

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators