The document provides an introduction to clustering algorithms, focusing on density-based methods like DENCLUE, which utilizes kernel density estimation to identify clusters of arbitrary shape while handling noise. It outlines the algorithm's two main steps: preprocessing and clustering, emphasizing its efficiency and ability to scale. Additionally, it discusses the challenges of kernel methods in high-dimensional spaces and suggests potential solutions for improving model interpretability.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
33 views16 pages
Den Clue
The document provides an introduction to clustering algorithms, focusing on density-based methods like DENCLUE, which utilizes kernel density estimation to identify clusters of arbitrary shape while handling noise. It outlines the algorithm's two main steps: preprocessing and clustering, emphasizing its efficiency and ability to scale. Additionally, it discusses the challenges of kernel methods in high-dimensional spaces and suggests potential solutions for improving model interpretability.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16
Introduction to Some
Complementary Algorithms for
Clustering the data
Collected & Prepared by:
Morteza H. Chehreghani
Data Mining Course, Sharif
University of Technology 1 Chapter 8. Cluster Analysis • What is Cluster Analysis? • Types of Data in Cluster Analysis • A Categorization of Major Clustering Methods • Partitioning Methods • Hierarchical Methods • Density-Based Methods • Grid-Based Methods • Model-Based Clustering Methods • Outlier Analysis • Summary Data Mining Course, Sharif University of Technology 2 Density-Based Clustering Methods • Clustering based on density (local cluster criterion), such as density-connected points • Major features: – Discover clusters of arbitrary shape – Handle noise – One scan – Need density parameters • Several interesting studies: – DBSCAN: Ester, et al. (KDD’96) – OPTICS: Ankerst, et al (SIGMOD’99). – DBRS DBRS:: Wang, et al. (2003). – DENCLUE: Hinneburg & D. Keim (KDD’98) – CLIQUE: Agrawal, et al. (SIGMOD’98) Data Mining Course, Sharif University of Technology 3 DENCLUE: Using density functions • DENsity-based CLUstEring by Hinneburg & Keim (KDD’98) • Major features – Solid mathematical foundation – Good for data sets with large amounts of noise – Allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets – Significant faster than existing algorithm (faster than DBSCAN by a factor of up to 45) – But needs a large number of parameters Data Mining Course, Sharif University of Technology 28 DENCLUE • Models the overall density of a set of points as the sum of ‘influence’ functions associated with each point. • The resulting overall density function will have local peaks, i.e., local density maxima, and these local peaks can be used to define clusters in a straightforward way. • For each data point, a hill climbing procedure finds the nearest peak associated with that point, and the set of all data points associated with a particular peak (called a local density attractor) becomes a (center-defined) cluster. • If the density at a local peak is too low, then the points in the associated cluster are classified as noise and discarded. • If a local peak can be connected to a second local peak by a path of data points, and the density at each point on the path is above a minimum density threshold, then the clusters associated with these local peaks are merged. • Thus, clusters of any shape can be discovered. Data Mining Course, Sharif University of Technology 30 DENCLUE • DENCLUE is based on a well-developed area of – statistics – pattern recognition – which is know as ‘kernel density estimation.’ • The goal of kernel density estimation (and many other statistical techniques as well) is to describe the distribution of the data by a function. • For kernel density estimation, the contribution of each point to the overall density function is expressed by an ‘influence’ (kernel) function. • The overall density is then merely the sum of the influence functions associated with each point. Data Mining Course, Sharif University of Technology 31 DENCLUE • Typically the influence or kernel function is symmetric (the same in all directions) and its value (contribution) decreases as the distance from the point increases. • the Gaussian function often used as a kernel function.
Data Mining Course, Sharif University of Technology 32
Influence function • Example
Data Mining Course, Sharif University of Technology 33
Density Attractor
Data Mining Course, Sharif University of Technology 34
DENCLUE • The DENCLUE algorithm has two steps, – preprocessing step – clustering step • In the pre-clustering step, a grid for the data is created by dividing the minimal bounding hyper-rectangle into d- dimensional hyper-rectangles with an edge length of 2σ. The rectangles that contain points are then determined. (Actually, only the occupied hyper-rectangles are constructed.) The hyper-rectangles are numbered with respect to a particular origin (at one edge of the bounding hyper-rectangle and these keys are stored in a search tree to provide efficient access during later processing. For each stored cell, the number of points, the sum of the points in the cell, and the connections to neighboring population cubes are also stored. Data Mining Course, Sharif University of Technology 35 DENCLUE • For the clustering step DENCLUE, considers only the highly populated cubes and the cubes that are connected to them. • Starting with each of these cubes as a cluster, the algorithm proceeds as follows: • For each point, x, the local density function is calculated only by considering those points that are from clusters which are – a) in clusters that are connected to the one containing the point – b) have cluster centroids within a distance of k of the point, where k = 4.
• DENCLUE discards clusters associated with a density
attractor whose density is less than ξ. • Finally, DENCLUE merges density attractors that can be joined by a path of points, all of which have a density greater than ξ.
Data Mining Course, Sharif University of Technology 36
DENCLUE
Data Mining Course, Sharif University of Technology 37
DENCLUE • This provides a high level of generality: – DBSCAN – k-means clusters
• DENCLUE scales well.
• Since at its initial stage it builds a map of hyper-rectangle cubes with edge length 2σ. For this reason, the algorithm can be classified as a grid-based method.
Data Mining Course, Sharif University of Technology 38
Kernel Density Estimation • Kernel estimates smooth out the contribution of each observed data point over a local neighborhood of that point • The contribution of data point x(i) to the estimate at some point x* depends on how far apart x(i) and x* are. • The extent of this contribution is dependent upon on the shape of the kernel function adopted and the width accorded to it.
Where K(t)dt = 1
• The quality of a kernel estimate depends less on the shape of K than on the value of h. • A common form for K is the Normal (Gaussian) curve, with h as its spread parameter (standard deviation), i.e.,
• where C is a normalization constant and t = x - x(i) is the distance of the query
point x to data point x(i). • The bandwidth h is equivalent to s, the standard deviation (or width) of the Gaussian kernel function. Data Mining Course, Sharif University of Technology 39 Kernel Density Estimation • Kernel methods are closely related to nearest neighbor methods. • Naively, N kernel estimations is N^2; for large data sets, fast algorithms are needed – Can use tricks learned from N-body, e.g., trees – Sufficient to bound the density, and compute until the bounds separate • Choice of bandwidth is critical (analogy: choice of histogram bin size) – Small values of h lead to very spiky estimates (not much smoothing at all) – large values lead to oversmoothing.
• often described as non-parametric
– because the model is largely data-driven with no parameters in the conventional sense (except for the bandwidth h).
• Such data-driven smoothing techniques are useful for data
interpretation, at least in one or two dimensions.
Data Mining Course, Sharif University of Technology 40
Drawbacks • In particular, as the number of variables in the predictor space increases, so the number of data points required to obtain accurate estimates increases exponentially. • This means that these "local neighborhood“ models tend to scale poorly to high dimensions. • lack of interpretability of the model. • Solutions: – using a subset of relevant variables to construct the model – transforming the original p variables into a new set of p' variables, where again p' << p. Data Mining Course, Sharif University of Technology 41