0% found this document useful (0 votes)
26 views1 page

Approval Data Sets To Demonstrate The Clustering Performance of The Two Algorithms. Our

The document presents two algorithms, k-modes and k-prototypes, that extend the k-means algorithm to cluster data containing categorical values and mixed numeric and categorical attributes. The k-modes algorithm uses modes instead of means and a simple matching measure to cluster categorical objects. The k-prototypes algorithm further integrates k-means and k-modes through a combined dissimilarity measure to cluster mixed attribute data. Experiments on large real-world datasets demonstrate the algorithms' efficiency in clustering large amounts of data.

Uploaded by

Prem Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views1 page

Approval Data Sets To Demonstrate The Clustering Performance of The Two Algorithms. Our

The document presents two algorithms, k-modes and k-prototypes, that extend the k-means algorithm to cluster data containing categorical values and mixed numeric and categorical attributes. The k-modes algorithm uses modes instead of means and a simple matching measure to cluster categorical objects. The k-prototypes algorithm further integrates k-means and k-modes through a combined dissimilarity measure to cluster mixed attribute data. Experiments on large real-world datasets demonstrate the algorithms' efficiency in clustering large amounts of data.

Uploaded by

Prem Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

The k-means algorithm is well known for its efficiency in clustering large data sets.

However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. In this paper we present two algorithms which extend the kmeans algorithm to categorical domains and domains with mixed numeric and categorical values. The k-modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function. With these extensions the k-modes algorithm enables the clustering of categorical data in a fashion similar to k-means. The k-prototypes algorithm, through the definition of a combined dissimilarity measure, further integrates the k-means and k-modes algorithms to allow for clustering objects described by mix The k-modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function. With these extensions the k-modes algorithm enables the clustering of categorical data in a fashion similar to k-means. The k-prototypes ed numeric and categorical attributes. We use the well known soybean disease and credit approval data sets to demonstrate the clustering performance of the two algorithms. Our experiments on two real world data sets with half a million objects each show that the two algorithms are efficient when clustering large data sets, which is critical to data mining applications.

You might also like