Quartile Clustering: A quartile based technique for Generating Meaningful Clusters

Goswami, Saptarsi; Chakrabarti, Amlan

Abstract:Clustering is one of the main tasks in exploratory data analysis and descriptive statistics where the main objective is partitioning observations in groups. Clustering has a broad range of application in varied domains like climate, business, information retrieval, biology, psychology, to name a few. A variety of methods and algorithms have been developed for clustering tasks in the last few decades. We observe that most of these algorithms define a cluster in terms of value of the attributes, density, distance etc. However these definitions fail to attach a clear meaning/semantics to the generated clusters. We argue that clusters having understandable and distinct semantics defined in terms of quartiles/halves are more appealing to business analysts than the clusters defined by data boundaries or prototypes. On the samepremise, we propose our new algorithm named as quartile clustering technique. Through a series of experiments we establish efficacy of this algorithm. We demonstrate that the quartile clustering technique adds clear meaning to each of the clusters compared to K-means. We use DB Index to measure goodness of the clusters and show our method is comparable to EM (Expectation Maximization), PAM (Partition around Medoid) and K Means. We have explored its capability in detecting outlier and the benefit of added semantics. We discuss some of the limitations in its present form and also provide a rough direction in addressing the issue of merging the generated clusters.

Comments:	ISSN 2151-9617
Subjects:	Databases (cs.DB)
Cite as:	arXiv:1203.4157 [cs.DB]
	(or arXiv:1203.4157v1 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.1203.4157
Journal reference:	Journal of Computing, Volume 4, Issue 2, February 2012, 48-55

Computer Science > Databases

Title:Quartile Clustering: A quartile based technique for Generating Meaningful Clusters

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators