Cluster analysis is a statistical technique for grouping similar objects or points into clusters, widely used in fields like machine learning and bioinformatics. It includes various types such as partitioning clustering (e.g., K-means), hierarchical clustering (agglomerative and divisive), density-based clustering (e.g., DBSCAN), grid-based clustering, model-based clustering (e.g., Gaussian Mixture Models), and subspace clustering. Each type has its unique approach and advantages for analyzing data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
8 views4 pages
Cluster Analysis
Cluster analysis is a statistical technique for grouping similar objects or points into clusters, widely used in fields like machine learning and bioinformatics. It includes various types such as partitioning clustering (e.g., K-means), hierarchical clustering (agglomerative and divisive), density-based clustering (e.g., DBSCAN), grid-based clustering, model-based clustering (e.g., Gaussian Mixture Models), and subspace clustering. Each type has its unique approach and advantages for analyzing data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 4
Cluster Analysis
Cluster analysis, also known as clustering, is a statistical technique
used in machine learning and data mining that involves the grouping of objects or points in such a way that objects in the same group, also known as a cluster, are more similar to each other than to those in other groups. It is a main task of exploratory data analysis and is used in various fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Types of Cluster Analysis • Partitioning Clustering: • This type of clustering divides data into a set of mutually exclusive clusters. The most well-known method in this category is the K-means clustering algorithm, where ‘K’ refers to the pre-specified number of clusters. These methods typically start with a random partitioning of data and refine it through an iterative process. • Hierarchical Clustering: • This type of clustering creates a tree of clusters. Hierarchical clustering, not only clusters the data, but also builds a hierarchy of clusters, like a binary tree structure. It comes in two flavors • Agglomerative (Bottom-Up): Each data point starts in its own cluster and pairs of clusters are merged as one moves up the hierarchy. • Divisive (Top-Down): All data points start in one cluster, and splits are performed recursively as one moves down the hierarchy. Types of Cluster Analysis • Density-Based Clustering • These types of algorithms look for areas in the feature space where there are high densities of observations. The most famous of these is DBSCAN (Density- Based Spatial Clustering of Applications with Noise). It works by defining a neighborhood around a data point and if there are a minimum number of points within this neighborhood then a cluster is started. • Grid-Based Clustering • These types of algorithms quantize the space into a finite number of cells forming a grid structure and perform all clustering operations on this obtained grid structure. The primary advantage of these algorithms is its fast processing time, which is typically dependent on the number of cells in each dimension in the quantized space. Types of Cluster Analysis • Model-Based Clustering • These algorithms hypothesize a model for each cluster and find the best fit of data to a given model. Examples of these are Gaussian Mixture Models and Expectation-Maximization algorithms. The advantage here is the model provides a probabilistic framework for estimating the characteristics of the process generating the data. • Subspace Clustering or Biclustering • While in standard clustering, an object belongs to exactly one cluster, in subspace clustering, an object can belong to more than one cluster and each cluster is associated with a subset of the dimensions. This type of clustering is particularly useful for high-dimensional data where each dimension represents a feature of the data.