0% found this document useful (0 votes)
4 views

A Domain Adaptive Density Clustering Algorithm for Data with Varying Density Distribution

The document presents a Domain-Adaptive Density Clustering (DADC) algorithm designed to improve clustering results for data with varying density distributions, equilibrium distributions, and multiple domain-density maximums. The DADC algorithm employs a domain-adaptive density measurement method, a cluster center self-identification approach, and a cluster self-ensemble technique to effectively identify and merge clusters, addressing issues of sparse cluster loss and fragmentation. Experimental results indicate that DADC outperforms existing algorithms in terms of clustering accuracy while maintaining low computational complexity, making it suitable for large-scale data applications.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

A Domain Adaptive Density Clustering Algorithm for Data with Varying Density Distribution

The document presents a Domain-Adaptive Density Clustering (DADC) algorithm designed to improve clustering results for data with varying density distributions, equilibrium distributions, and multiple domain-density maximums. The DADC algorithm employs a domain-adaptive density measurement method, a cluster center self-identification approach, and a cluster self-ensemble technique to effectively identify and merge clusters, addressing issues of sparse cluster loss and fragmentation. Experimental results indicate that DADC outperforms existing algorithms in terms of clustering accuracy while maintaining low computational complexity, making it suitable for large-scale data applications.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

A Domain Adaptive Density Clustering Algorithm for Data with

Varying Density Distribution

ABSTRACT:

As one type of efficient unsupervised learning methods, clustering algorithms


have been widely used in data mining and knowledge discovery with noticeable
advantages. However, clustering algorithms based on density peak have limited
clustering effect on data with varying density distribution (VDD), equilibrium
distribution (ED), and multiple domain-density maximums (MDDM), leading to the
problems of sparse cluster loss and cluster fragmentation. To address these
problems, we propose a Domain-Adaptive Density Clustering (DADC) algorithm,
which consists of three steps: domain-adaptive density measurement, cluster
center self-identification, and cluster self-ensemble. For data with VDD features,
clusters in sparse regions are often neglected by using uniform density peak
thresholds, which results in the loss of sparse clusters.

We define a domain-adaptive density measurement method based on K-Nearest


Neighbors (KNN) to adaptively detect the density peaks of different density
regions. We treat each data point and its KNN neighborhood as a subgroup to
better reflect its density distribution in a domain view. In addition, for data with
ED or MDDM features, a large number of density peaks with similar values can be
identified, which results in cluster fragmentation.

We propose a cluster center self-identification and cluster self-ensemble method


to automatically extract the initial cluster centers and merge the fragmented
clusters. Experimental results demonstrate that compared with other
comparative algorithms, the proposed DADC algorithm can obtain more
reasonable clustering results on data with VDD, ED and MDDM features.
Benefitting from a few parameter requirement and non-iterative nature, DADC
achieves low computational complexity and is suitable for large-scale data
clustering.
EXISTING SYSTEM:

Compared with the existing clustering algorithms, the proposed domain-adaptive


density method in this work can adaptively detect the domain densities and
cluster centers in regions with different densities.

This method is very feasible and practical in actual big data applications. The
proposed cluster self-identification method can effectively identify the candidate
cluster centers with minimum artificial intervention.

Moreover, the proposed CFD model takes full account of the relationships
between clusters of large-scale datasets, including the inter-cluster density
similarity cluster crossover degree, and cluster density stability

PROPOSED SYSTEM:

• To address the problem of sparse cluster loss of data with VDD, a domain-
adaptive density measurement method is proposed to detect density peaks in
different density regions. According to these density peaks, cluster centers in both
dense and sparse regions are effectively discovered, which well addresses the
sparse cluster loss problem.

• To automatically extract the initial cluster centers, we draw a clustering decision


graph based on domain density and Delta distance. We then propose a cluster
center self-identification method and automatically determine the parameter
thresholds and cluster centers from the clustering decision graph.

• To address the problem of cluster fragmentation on data with ED or MDDM, an


innovative Cluster Fusion Degree (CFD) model is proposed, which consists of the
inter-cluster density similarity, cluster crossover degree, and cluster density
stability.

You might also like