Data Mining-Constraint Based Cluster Analysis
Data Mining-Constraint Based Cluster Analysis
ANALYSIS
Constraint based Clustering
Constraint based Clustering finds clusters that
satisfy user-specified preferences or constraints
Desirable to have the Clustering process take the user
preferences and constraints into consideration
Expected number of clusters
Maximal / Minimal Cluster size
Weights for dimensions / Important dimensions
Mining becomes focused
Categories of Constraints
Constraints on Individual objects
Pair-wise constraints
constraints
Iteratively refine solution
Move m customers from cluster Ci to Cj if Ci has
atleast m surplus customers
Movement done if total sum of distances (objects
Centers) is reduced
Can be directed by selecting promising points
Dead lock has to be avoided (constraint cannot be
satisfied)
Instead of points can work on micro-clusters
Semi-Supervised Cluster Analysis
Constraint based Semi-supervised Clustering
Relies on user provided labels or constraints
Initialize based on labeled objects
Modify Objective function
Distance based Semi-supervised clustering
Adaptive distance measure trained to satisfy
labels or constraints
Integrates
unsupervised
clustering
with
supervised classification
Transforms clustering task into Classification
Points to be clustered Y
Non-existence points