Detailed Clustering in Machine Learning Notes
Detailed Clustering in Machine Learning Notes
-------------------------------
Clustering is a type of unsupervised learning in which we group data points into distinct clusters,
such that the points in each cluster are more similar to each other than to those in other clusters.
This is widely used for data exploration, pattern recognition, and as a pre-processing step in other
algorithms.
--------------------------------
a) **Partitioning Clustering**:
- Partitioning methods divide the data set into non-overlapping subsets (clusters). A popular
- **K-Means**: It is an iterative algorithm that assigns each point to one of \(K\) clusters based
on the mean of the points within the cluster. The algorithm minimizes the intra-cluster variance
- These methods are sensitive to the initial selection of centroids and the number of clusters,
- This type of clustering assumes that the data is generated by a mixture of several probability
distributions (usually Gaussian). The goal is to estimate the parameters of these distributions.
- **Gaussian Mixture Models (GMM)**: A probabilistic model where the data points are modeled
as a mixture of several Gaussian distributions. Each data point has a probability of belonging to a
certain cluster.
- **Expectation Maximization (EM)**: An iterative algorithm used for fitting a GMM. The
algorithm alternates between estimating the probability distribution (Expectation step) and
c) **Hierarchical Clustering**:
- Hierarchical clustering builds a hierarchy of clusters by either starting with individual data
points and merging them (agglomerative) or starting with all points in one cluster and splitting them
(divisive).
- **Agglomerative Clustering**: Begins with each data point as its own cluster and iteratively
- **Divisive Clustering**: Starts with a single cluster that contains all the data points and
- A key advantage of hierarchical clustering is that the number of clusters does not need to be
predefined.
d) **Fuzzy Clustering**:
- In fuzzy clustering, each data point can belong to multiple clusters with different degrees of
- **Fuzzy C-Means**: The algorithm assigns each data point a membership value for each
cluster, and the sum of the membership values for each data point is equal to 1. This allows for soft
2. **Birch Algorithm**:
- The BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) algorithm is an
efficient clustering algorithm for large datasets. It constructs a Clustering Feature (CF) tree, which
- The CF tree is built incrementally, where each node in the tree represents a cluster summary.
The Birch algorithm uses this structure to efficiently compute clusters without needing to store the
entire dataset.
- The BIRCH algorithm is particularly useful for situations where the dataset is too large to fit into
3. **CURE Algorithm**:
- CURE (Clustering Using REpresentatives) is an algorithm designed for clustering large datasets.
- The algorithm addresses the issue of outliers and high dimensionality by selecting a fixed
number of representative points from each cluster. These points are then used to form a cluster, and
the algorithm uses a combination of distance and centroid-based techniques to build the final cluster
structure.
- CURE is highly efficient and effective for clustering large datasets with varying shapes and sizes.
- **Gaussian Mixture Models (GMM)**: This is a probabilistic model used for clustering that
assumes that the data is generated by a mixture of several Gaussian distributions. Each cluster in a
GMM is represented by a Gaussian distribution, and the model estimates the parameters (mean,
- **Expectation Maximization (EM)**: The EM algorithm is used to estimate the parameters of the
- **Expectation (E-step)**: Compute the probability that each data point belongs to each cluster
- **Maximization (M-step)**: Update the parameters (mean, covariance, and mixture weights) of
the Gaussians based on the probabilities computed in the E-step.
5. **Parameters Estimations**:
- **Maximum Likelihood Estimation (MLE)**: MLE is a method for estimating the parameters of a
statistical model. It involves choosing the parameter values that maximize the likelihood function,
- **Maximum A Posteriori (MAP)**: MAP is similar to MLE, but it incorporates a prior probability
distribution on the parameters. This prior distribution represents any prior knowledge we have about
the parameters. MAP estimation aims to maximize the posterior probability, which is a combination
6. **Applications of Clustering**:
- **Image Segmentation**: Clustering is used to group similar pixels in an image, allowing for
- **Market Segmentation**: Businesses use clustering to group customers with similar behaviors
- **Anomaly Detection**: Clustering can be used to identify outliers or anomalous data points that
- **Social Network Analysis**: Clustering can be used to detect communities in social networks,
where nodes (individuals) within the same cluster have similar characteristics.
- **Document Categorization**: In text mining, clustering can be used to group similar documents