0% found this document useful (0 votes)
10 views11 pages

MLP U4

Uploaded by

rishiparmar921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views11 pages

MLP U4

Uploaded by

rishiparmar921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

MACHINE

LEARNING
WITH PYTHON
SEMESTER 5
UNIT - 4

HI COLLEGE
SYLLABUS
UNIT - 4

HI COLLEGE
UNSUPERVISED LEARNING ALGORITHMS:
INTRODUCTION TO CLUSTERING
Clustering is a fundamental technique in unsupervised learning that involves
grouping similar data points together. It is used to explore and uncover
patterns or structures within a dataset without any predefined labels or target
variables. The goal of clustering is to partition the data into distinct groups, or
clusters, such that data points within the same cluster are more similar to each
other than those in different clusters.

K-MEANS CLUSTERING
K-means clustering is a popular and widely used algorithm for partitioning data
into distinct clusters. It is an iterative algorithm that aims to minimize the
within-cluster variance, also known as the sum of squared distances between
data points and their assigned cluster centroid.

Here's how the K-means algorithm works:

1. Initialization:
Begin by choosing the number of clusters, denoted as 'k', that you want to
identify in your data.
Randomly initialize the centroids of these 'k' clusters by selecting 'k' data
points from the dataset.

2. Assignment:
For each data point, calculate the distance between the data point and
each centroid.
Assign the data point to the cluster associated with the nearest centroid.
This is typically done based on Euclidean distance, but other distance
metrics can also be used.

3. Update:
Once all data points are assigned to a cluster, compute the new centroids of
each cluster. The centroids are calculated as the mean of all the data points
assigned to that cluster.

4. Iteration:
Repeat steps 2 and 3 until convergence is achieved. Convergence occurs
when the assignments of data points to clusters no longer change
significantly or when a maximum number of iterations is reached.

HiCollege Click Here For More Notes 01


5. Result:
After convergence, the K-means algorithm will have identified 'k' clusters,
with each data point belonging to one of the clusters.
The final result will include the cluster centers (representing the centroid of
each cluster) and the assignment of data points to their respective clusters.

HIERARCHICAL CLUSTERING
Hierarchical clustering is a clustering algorithm that builds a hierarchical
structure of clusters by iteratively merging or splitting clusters. It does not
require a predefined number of clusters, unlike the K-means algorithm.
Hierarchical clustering assigns a data point to a cluster at each level of the
hierarchy, creating a tree-like structure called a dendrogram.

There are two main types of hierarchical clustering:

Agglomerative Hierarchical Clustering:


Agglomerative clustering initiates by treating each data point as a
distinct cluster.
In subsequent iterations, it merges the two closest clusters based on a
predefined distance measure.
This merging process persists until either all data points unite into a
single cluster or a defined stopping condition is fulfilled.
The outcome is a dendrogram illustrating the merging sequence and
hierarchical relationships among clusters.
The dendrogram allows cutting at a desired similarity level to derive a
specific number of clusters.

Divisive Hierarchical Clustering:


Divisive clustering adopts the contrary strategy compared to
agglomerative clustering.
It commences with all data points grouped into a single cluster.
Through each iteration, it recursively divides a cluster into two based on
a specified criterion.
This division process continues until either each data point forms its
cluster or a designated stopping condition is achieved.
Similar to agglomerative clustering, the result is a dendrogram; however,
it displays the splitting process within the hierarchy.

HiCollege Click Here For More Notes 02


KOHONEN SELF-ORGANIZING MAPS
Kohonen Self-Organizing Maps (SOM), also known as a Self-Organizing Feature
Map, is an unsupervised learning algorithm that creates a low-dimensional
representation of high-dimensional data. It was developed by Teuvo Kohonen
in the 1980s.

SOMs are often used for visualizing and analyzing complex, high-dimensional
data. The algorithm maps the input data onto a grid of neurons, each with a
weight vector associated with it. The grid can have any shape, but it is usually
organized as a two-dimensional grid.

Here's how the Kohonen SOM algorithm works:

1. Initialization:
Randomly initialize the weight vectors of the neurons in the grid. These
weight vectors have the same dimensionality as the input data.

2. Training:
Select a random input vector from the dataset.
Compute the Euclidean distance between the input vector and the weight
vectors of all neurons.
Identify the best-matching unit (BMU) - the neuron with the closest weight
vector to the input vector.

3. Update:
Update the weight vectors of the BMU and its neighboring neurons to make
them more similar to the input vector.
The magnitude of the update depends on the learning rate, which is initially
high and gradually decreases over time.
The neighborhood size also decreases over time, allowing the algorithm to
refine the representation.

4. Iteration:
Repeat steps 2 and 3 for a specified number of iterations or until
convergence is achieved.

After training, the SOM represents the data in a low-dimensional grid where
similar input vectors are placed close together. This allows for visual analysis of
the data and identification of clusters or patterns. Each neuron in the SOM grid
can be associated with a specific cluster or category.

HiCollege Click Here For More Notes 03


Some key properties and benefits of Kohonen SOMs include:

1. Dimensionality Reduction: SOMs enable the visualization and exploration of


high-dimensional data by mapping it onto a lower-dimensional grid.

2. Unsupervised Learning: SOMs do not require labeled data and can discover
patterns and relationships in an unsupervised manner.

3. Topological Preservation: SOMs preserve the topological structure of the


input data, meaning nearby input vectors remain close in the SOM grid.

4. Robustness: SOMs are robust to noise, as a noisy input vector will still find its
place within the grid based on its relationship to other vectors.

Kohonen Self-Organizing Maps have been widely used in various domains,


including data visualization, clustering analysis, image compression, anomaly
detection, and pattern recognition. They provide a powerful tool for
understanding complex datasets and finding underlying structures.

IMPLEMENTATION OF UNSUPERVISED ALGORITHMS


Unsupervised learning algorithms are used to discover patterns, clusters, and
relationships in data without the need for labeled examples. Here are some
common implementations of unsupervised algorithms:

1. K-means Clustering:
One of the most popular clustering algorithms.
Divides data points into K clusters, where K is a user-specified parameter.
Each data point is assigned to the cluster with the closest centroid.
The algorithm iteratively updates the cluster centroids until convergence.

2. Hierarchical Clustering:
As discussed earlier, this algorithm builds a hierarchy of clusters based on
the similarity between data points.
The choice of linkage method (e.g., single-linkage, complete-linkage,
average-linkage) and distance metric impacts the results.
Dendrograms can be used to visualize the clustering process.

HiCollege Click Here For More Notes 04


3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise):
DBSCAN clusters data points based on proximity and the presence of a
minimum number of neighbors within a specified distance.
It excels in detecting clusters of various shapes and is robust against
noisy data.
Points not assigned to any cluster are considered outliers or noise.

4. Gaussian Mixture Models (GMM):


Assumes data points are generated from a blend of Gaussian
distributions.
Models data by combining K Gaussian distributions, where the value of K
is specified by the user.
The algorithm estimates Gaussian component parameters through an
Expectation-Maximization (EM) approach.

5. Self-Organizing Maps (SOM):


SOMs offer a low-dimensional representation of high-dimensional data
to aid visualization and pattern recognition.
The algorithm maps input data onto a grid of neurons and updates their
weights based on the similarity to the input vectors.

6. Principal Component Analysis (PCA):


A technique for reducing the dimensionality of data, projecting it from a
high-dimensional space to a lower-dimensional one.
It identifies principal components—directions capturing the most
variance in the data.
The lower-dimensional representation is achieved by projecting the data
onto these principal components.

7. t-SNE (t-Distributed Stochastic Neighbor Embedding):


Another dimensionality reduction technique primarily used for
visualization purposes.
It maps high-dimensional data into a two- or three-dimensional space
while preserving pairwise distances between points.
Particularly effective at preserving local structure and clustering patterns
in the data.

HiCollege Click Here For More Notes 05


FEATURE SELECTION AND DIMENSIONALITY REDUCTION
Feature selection and dimensionality reduction are techniques used to reduce
the number of features or dimensions in a dataset. They are important
preprocessing steps in machine learning, as they can improve model
performance, reduce overfitting, and enhance interpretability. Here's a brief
explanation of feature selection and dimensionality reduction:

Feature Selection:

Feature selection is the process of selecting a subset of relevant features from


the original feature set. The goal is to choose features that contribute the most
to the target variable while discarding irrelevant or redundant features. Benefits
of feature selection include reducing computation time, improving model
interpretability, and mitigating the risk of overfitting.

Popular techniques for feature selection include:

1. Filter Methods:
Use statistical measures like correlation, chi-square, or mutual information
to rank features.
Select the top-ranked features based on a predefined threshold or a fixed
number.

2. Wrapper Methods:
Involve evaluating the performance of different feature subsets using an
external machine learning algorithm.
Search for the best subset of features, typically through a backward or
forward selection process.

3. Embedded Methods:
Incorporate feature selection directly into the learning algorithm.
Model-specific techniques that determine feature importance during the
training phase, e.g., LASSO regularization or decision tree-based importance.

HiCollege Click Here For More Notes 06


DIMENSIONALITY REDUCTION:
Dimensionality Reduction Overview:
Aims to decrease the number of dimensions in a dataset while retaining
maximal information.
Particularly useful for high-dimensional data suffering from the curse of
dimensionality or for visualizing data in lower-dimensional spaces.

Common Dimensionality Reduction Techniques:


a. Principal Component Analysis (PCA):
Identifies orthogonal axes (principal components) capturing
maximum data variance.
Enables projection onto a lower-dimensional space with minimal
information loss.
b. Singular Value Decomposition (SVD):
Decomposes the data matrix into three distinct matrices.
Yields a low-rank approximation that preserves critical features.
c. t-Distributed Stochastic Neighbor Embedding (t-SNE):
Proficient for visualizing high-dimensional data in reduced
dimensions.
Constructs a probability distribution over pairs of high-dimensional
points, learning a lower-dimensional map that retains pairwise
similarities.
d. Autoencoders:
Neural network architectures reconstruct input data from a
compressed representation of lower dimensions.
The intermediate layers learn a compressed representation suitable
for lower-dimensional usage.

HiCollege Click Here For More Notes 07


PRINCIPAL COMPONENT ANALYSIS
Principal Component Analysis (PCA) is a popular dimensionality reduction
technique used to transform a high-dimensional dataset into a lower-
dimensional space. It identifies the principal components, which are the
directions that capture the maximum variance in the data.

Here is a step-by-step explanation of how PCA works:

1. Standardize the data:


- PCA requires the data to be centered around zero and have unit variance.
- If the features in the dataset are on different scales, it is recommended to
standardize them.

2. Compute the covariance matrix:


- The covariance matrix measures the relationship between pairs of features.
- It is computed as the dot product of the standardized data matrix and its
transpose.

3. Compute the eigenvectors and eigenvalues of the covariance matrix:


- The eigenvectors represent the principal components, and the eigenvalues
represent their corresponding variances.
- They are computed using linear algebra techniques.

4. Select the principal components:


- Principal components are chosen based on their corresponding eigenvalues.
- The components with higher eigenvalues capture more variance and are
considered more important.

5. Project the data onto the selected principal components:


- The original data is transformed into a lower-dimensional space by
projecting it onto the selected principal components.
- This is done by taking the dot product of the standardized data matrix with
the eigenvector matrix.

6. Determine the explained variance:


- The proportion of variance explained by each principal component can be
calculated using the corresponding eigenvalues.
- This information helps in understanding how much information is retained
in the lower-dimensional representation.

HiCollege Click Here For More Notes 08


HiCollege Click Here For More Notes 09

You might also like