0% found this document useful (0 votes)

8 views19 pages

Unit 4

This document covers various unsupervised learning techniques, focusing on clustering methods such as Partitioned, Hierarchical, and Density-based clustering. It details algorithms like K-means, K-modes, DBSCAN, and Self-Organizing Maps, explaining their processes, advantages, and disadvantages. Additionally, it discusses the Expectation-Maximization algorithm and methods for selecting the optimal number of clusters.

Uploaded by

Jakka Karthik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views19 pages

Unit 4

Uploaded by

Jakka Karthik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

UNIT – IV: Unsupervised Learning

Clustering basics (Partitioned, Hierarchical and Density based) -Means clustering – K-

Mode clustering – Self organizing maps – Expectation maximization – Principal
Component Analysis – Kernel PCA – tSNE (t-distributed stochastic neighbour
embedding) - Metrics & Error Correction.

Clustering is a type of unsupervised machine learning technique where data is grouped into
clusters based on similarity. The goal is to find inherent patterns or structures within data.
Clustering methods can be broadly categorized into three types: Partitioned Clustering,
Hierarchical Clustering, and Density-based Clustering.

1. Partitioned Clustering In partitioned clustering, the data is divided into a set of distinct,
non-overlapping clusters. Each data point is assigned to exactly one cluster.

● K-means clustering is the most popular method under partitioned clustering.

Process:
1. Initialize K centroids randomly.
2. Assign each data point to the nearest centroid based on Euclidean distance.
3. Update the centroids as the mean of points in each cluster.
4. Repeat until centroids stabilize.

Key Features: Fast and efficient but sensitive to outliers.

● K-medoids Clustering

Process:

1. Initialize K medoids (actual data points).

2. Assign each data point to the nearest medoid.
3. Update medoids by selecting the point that minimizes total distance within the
cluster.
4. Repeat until the medoids stabilize.

Key Features: More robust to outliers than K-means.

● K-modes Clustering

Process:

1. Initialize K modes (categorical prototypes).

2. Assign each data point to the nearest mode based on dissimilarity (number of
mismatches).
3. Update modes by selecting the most frequent category for each attribute.
4. Repeat until modes stabilize.

Key Features: Designed for clustering categorical data effectively.

2. Hierarchical Clustering

Hierarchical clustering builds a tree-like structure (dendrogram) that groups data based on
their similarity. This method does not require the number of clusters to be defined upfront.

There are two types of hierarchical clustering:

● Agglomerative (Bottom-up approach):

○ Starts by treating each data point as a separate cluster.
○ It then iteratively merges the two most similar clusters until only one cluster
remains.
● Divisive (Top-down approach):
○ Starts with all data points in a single cluster and recursively splits the most
dissimilar clusters.

The similarity between clusters can be calculated using different linkage methods (e.g.,
single-linkage, complete-linkage, average-linkage).
3. Density-Based Clustering

Density-based clustering identifies clusters based on the density of points in the data space.
It is well-suited for identifying clusters of arbitrary shape and handling outliers effectively.

● DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is the

most popular algorithm in this category.
○ It groups together closely packed points (points that are close to each other)
and marks points that are in low-density regions as outliers.
○ The algorithm requires two parameters: the epsilon (maximum distance
between points in a cluster) and minPts (the minimum number of points
required to form a dense region).
● OPTICS (Ordering Points to Identify the Clustering Structure) is another
density-based algorithm that works similarly to DBSCAN but produces a more
detailed view of the cluster structure.

K-means Clustering Algorithm

K-means clustering is a partitioned clustering technique that groups numerical data into K
clusters. It minimizes a clustering cost function (sum of squared distances) to form
well-defined clusters.

K-means uses a similarity measure (typically Euclidean distance) and iteratively adjusts
cluster centers (centroids) to minimize the total variance within clusters.

Steps in the K-means Algorithm

Let:

● X={x1,x2,…,xn} be the set of data points.

● V={v1,v2,…,vk} be the set of K cluster centers.
1. Initialize Cluster Centers
○ Randomly select K initial cluster centers (centroids).
2. Assign Points to Nearest Cluster
○ For each data point xi, calculate its distance to each cluster center vj
○ Assign xi to the cluster with the nearest center based on the minimum
distance.
3. Update Cluster Centers

4. Recalculate Distances
○ Compute the distance between each data point and the updated cluster
centers.
5. Check for Convergence
○ If no data point has changed its cluster assignment, stop.
○ Otherwise, repeat from Step 2.

The k in k-means clustering represents the number of clusters into which the data is divided.
Significance of K:

1. Determines Cluster Count:

○ Specifies the number of groups to segment the data into.
2. Controls Model Complexity:
○ A small k might oversimplify, while a large k could overfit.
3. Directly Affects Results:
○ Proper selection of k ensures meaningful and well-separated clusters.
4. Methods to Choose k:
○ Elbow Method: Find the k where the reduction in within-cluster variance
slows down.
○ Silhouette Score: Measures how similar an object is to its cluster versus
others.

Key Points

● Objective: Minimize the sum of squared distances between points and their
assigned cluster centroids.
● Complexity: Iterative algorithm with time complexity O(n×k×d), where n is the
number of points, k is the number of clusters, and d is the dimensionality.
● Convergence: Guaranteed to converge, but may reach a local minimum depending
on the initial cluster centers.

Advantages

● Simple and efficient for large datasets.

● Easy to implement and interpret.
Disadvantages

● Sensitive to the choice of initial centroids.

● Requires specifying the number of clusters (K) in advance.
● Assumes clusters are spherical and evenly sized.

PROBLEMS:

1)The data mining task is to cluster points into three clusters, where the points are A1(2,10),
A2(2,5), A3(8,4), B1(5,8), B2(7,5), B3(6,4) , C1(1,2), C2( 4,9) The distance function is
Manhattan distance. Suppose initially we assign A1 ,B1,C1 as the center of each cluster, use
the k-means algorithm to show only a) The three clusters after the first round of execution .
b) The final three clusters

https://www.youtube.com/watch?v=KzJORp8bgqs&list=PL4gu8xQu0_5KiYnRlueicckEmpFAi
RD5Y&index=5

2)https://www.youtube.com/watch?v=nWmeC61DgBo&list=PL4gu8xQu0_5KiYnRlueicckEm
pFAiRD5Y&index=8
L1 dist(Manhattan dist):

Selecting the optimal number of clusters (K) is crucial in clustering algorithms like
K-means. Two commonly used methods are:
https://www.youtube.com/watch?v=wW1tgWtkj4I

1. Elbow Method

The Elbow Method helps find the optimal number of clusters

by analyzing the Within-Cluster Sum of Squares (WCSS)

Or Inertia.

The plot forms an elbow-like shape, and the point where

the curve bends is the optimal number of clusters.

Pros:Simple to understand and apply.

Cons:The elbow point is sometimes subjective and not always clear.

2. Silhouette Method

The Silhouette Method evaluates the quality of clusters by measuring how similar data points
are to their own cluster compared to other clusters.

https://www.youtube.com/watch?v=FGXkbawTHRQ&list=PL4gu8xQu0_5KiYnRlueicckEmpF
AiRD5Y&index=27

Pros:Provides a clear measure of cluster quality.

Cons:More computationally intensive than the Elbow Method.

K-modes Algorithm (For Categorical Data) To cluster categorical data by minimizing
dissimilarity. Steps:

1. Initialize Modes
○ Arbitrarily select k objects as initial cluster modes.
2. Assign Objects to Clusters
○ Assign each object to the cluster with the most similar mode based on the
least number of mismatches (dissimilarity).
3. Update Modes
○ For each cluster, update the mode by finding the most frequently occurring
value for each attribute (column).
4. Repeat Until Convergence
○ Repeat steps 2 and 3 until the clusters remain unchanged in two consecutive
iterations.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Purpose: DBSCAN clusters data points based on density, identifying core points, noise, and
clusters without needing to specify the number of clusters in advance.

Key Concepts:

1. Neighborhood (ε): Objects within a radius ε of a point.

2. Core Object: A point with at least minPts neighbors within ε.
3. Directly Density Reachable: Points within ε radius of a core point.(the objects that
are reachable directly from the core object are called directly density reachable.)
4. Density Reachable: Points reachable through a chain of core points.(the objects that
are reachable from the directly density reachable objects are called density
reachable objects.)

Steps in DBSCAN Algorithm:

1. Select Random Object

○ Choose an unclustered point randomly.
2. Check Core Object
○ If the point has at least minPts neighbors within ε, it is a core object.
○ If not, mark it as noise and go back to Step 1.
3. Expand Cluster
○ Add all directly density reachable points to a candidate set.
4. Process Candidate Set
○ Remove a point from the candidate set and repeat Step 2 for it.
○ Continue until the candidate set is empty.
5. Form Cluster
○ Group all density reachable points together into one cluster.
6. Repeat
○ Continue until all points are either clustered or marked as noise.

Ex1.https://www.youtube.com/watch?v=-p354tQsKrs&list=PL4gu8xQu0_5KiYnRlueicckEmp
FAiRD5Y&index=9
Ex2.https://www.youtube.com/watch?v=ZOLYaa9Jex0&list=PL4gu8xQu0_5KiYnRlueicckEm
pFAiRD5Y&index=10

Advantages:Can detect clusters of arbitrary shape,Automatically identifies noise

(outliers),No need to specify the number of clusters in advance.

Disadvantages:Sensitive to the choice of ε and minPts.Not suitable for datasets with

varying densities.
Self organizing maps

Self-Organizing Maps (SOMs), also known as Kohonen Maps, are a type of artificial
neural network used for unsupervised learning. They are mainly used for dimensionality
reduction and visualization of high-dimensional data.

SOMs map high-dimensional input data onto a 2D grid (usually) of neurons. Each neuron in
the grid has a weight vector of the same dimension as the input data, and the goal is to train
the network such that similar data points in the input space are mapped close to each other
on the grid.

Advantages of SOMs:

● Effective for visualizing high-dimensional data.

● Can handle both numerical and categorical data.
● Good for discovering hidden patterns or structures in data.

Disadvantages of SOMs:

● Sensitive to initialization and parameters (like learning rate).

● Computationally expensive for large datasets.
● May require a lot of iterations to converge.

Applications of SOMs:

● Data Visualization
● Clustering
● Feature Mapping
● Pattern Recognition

EX:https://www.youtube.com/watch?v=InVvyioWDlw&list=PL4gu8xQu0_5JK6KmQi-Qx5hI3
W13RJbDY&index=24
EM Algorithm (Expectation-Maximization Algorithm)

The Expectation-Maximization (EM) algorithm is a powerful statistical method used for

parameter estimation in models with latent variables (hidden or unobserved variables). It is
commonly used for clustering, particularly in Gaussian Mixture Models (GMMs), and other
probabilistic models.

The EM algorithm iteratively estimates the parameters of the model by alternating between
two steps: Expectation (E-step) and Maximization (M-step).

Steps in the EM Algorithm:

1. Initialization
○ Start with initial guesses for the parameters of the model (e.g., means,
variances, and mixing coefficients in a GMM).
2. E-step (Expectation Step)
○ Given the current parameter estimates, calculate the expected value of the
latent (hidden) variables based on the observed data.
○ This step computes the probability of the data belonging to different clusters
(or distributions) using the current parameters.
3. M-step (Maximization Step)
○ Update the parameters of the model by maximizing the likelihood function (or
minimizing the negative log-likelihood) based on the expected values
calculated in the E-step.
4. Repeat
○ Repeat the E-step and M-step until the parameters converge (i.e., the change
in the parameter values between iterations is below a threshold).

Applications of the EM Algorithm:

● Clustering
● Missing Data
● Image and Speech Recognition
● Hidden Markov Models (HMMs)

Advantages of the EM Algorithm:

● Can handle models with hidden variables.

● Flexible and widely applicable in

various probabilistic models.

● Often finds a good solution even with complex, high-dimensional data.

Disadvantages of the EM Algorithm:

● Local Optima: It is sensitive to the initial parameter estimates and may converge to
a local optimum.
● Computational Complexity: Can be computationally expensive, especially for large
datasets and models with many parameters.
● Convergence Speed: The algorithm may require many iterations to converge.

Example:

You have a bag with candies of different colors (e.g., Red, Green, and Blue), but you
don't know the exact proportion of each color. You use the EM algorithm to estimate
these proportions.

Step 1: Expectation Step (E Step)

1. You take a candy out of the bag without looking at its color.
2. A friend guesses the color of the candy based on prior knowledge, providing both a
guess and their confidence level for each color.
3. For example:
a. First candy: 80% chance Red, 10% Green, 10% Blue.
b. Second candy: 30% Red, 60% Green, 10% Blue.
c. Third candy: 20% Red, 10% Green, 70% Blue.

Step 2: Maximization Step (M Step)

You use your friend's guesses to estimate the number of candies of each color.

Sum the confidence values for each color across all guesses:

Red: (0.8+0.3+0.2)/3=0.43

(0.8+0.3+0.2)/3=0.43 (43%)

Green: (0.1+0.6+0.1)/3=0.27

(0.1+0.6+0.1)/3=0.27 (27%)

Blue: (0.1+0.1+0.7)/3=0.30

(0.1+0.1+0.7)/3=0.30 (30%)
Update your estimates of the color proportions in the bag based on these calculations.

Step 3: Repeat

● Go back to the E-step and continue the process with updated guesses and
estimates.
● Over multiple iterations, the guesses and estimates converge to the actual
proportions of candies in the bag.

AGGLOMERATIVE CLUSTERING:

Linkage refers to the methods used to determine the distance between clusters in
hierarchical clustering algorithms. These methods define how to compute the "distance"
between two clusters based on the individual distances between their members.

1. Single Linkage: In single linkage clustering, the distance between two clusters is
defined as the minimum distance between any single pair of points in the two clusters.

Advantages:

○ Can form elongated or irregularly shaped clusters.

○ Simple to compute.

Disadvantages:

○ Sensitive to noise and outliers (since a single pair of points can influence the
distance calculation).
○ May result in chaining, where clusters grow by connecting distant points.

https://www.youtube.com/watch?v=tXYAdGn-SuM
2. Complete Linkage:In complete linkage clustering, the distance between two clusters is
defined as the maximum distance between any pair of points in the two clusters.

Advantages:Tends to produce compact, spherical clusters.,Less sensitive to noise and

outliers compared to single linkage.

Disadvantages:May be less flexible in terms of handling clusters of different shapes.,Can

result in a more rigid hierarchical structure.

https://www.youtube.com/watch?v=0A0wtto9wHU&t=25s

Principal Component Analysis

Principal Component Analysis (PCA) is a dimensionality reduction technique used to

reduce the complexity of data while preserving as much information as possible. It
transforms data into a new coordinate system where the axes (called principal
components) are ordered by the variance of the data along them. PCA is commonly used
for feature extraction, noise reduction, and visualization.

PCA (Principal Component Analysis) works as a dimensionality reduction technique by

transforming the data into a new set of axes, called principal components, which are
ranked by the amount of variance they capture in the data.

How PCA Works:

1. Standardize the Data (Optional but recommended):

○ If the data features have different scales, it's important to standardize them
(i.e., subtract the mean and divide by the standard deviation) to bring them to
the same scale.
2. Compute the Covariance Matrix:
○ Calculate the covariance matrix to understand how the features vary with
respect to each other. The covariance matrix captures the relationships
between different features.
3. Compute the Eigenvalues and Eigenvectors:
○ Calculate the eigenvalues and eigenvectors of the covariance matrix.
Eigenvectors represent the directions of maximum variance (the principal
components), and eigenvalues represent the magnitude of the variance along
those directions.
4. Sort Eigenvalues and Eigenvectors:
○ Sort the eigenvectors in descending order of their corresponding eigenvalues.
This gives the principal components in order of importance (most to least
variance explained).
5. Select the Top k Principal Components:
○ Choose the top k eigenvectors (based on the highest eigenvalues) to form a
new feature space. The number of principal components (k) to retain depends
on the desired amount of variance to capture.
6. Transform the Data:
○ Project the original data onto the selected principal components to obtain the
reduced-dimensional representation of the data.

Mathematical Formulation:

1. Given a dataset X with n data points and d dimensions, we want to reduce the data
to a lower k-dimensional space.
2. The covariance matrix C is computed as

3. Compute the eigenvalues λ and eigenvectors v of the covariance matrix.

4. Sort the eigenvalues and select the eigenvectors corresponding to the largest
eigenvalues.
5. The reduced data is obtained by projecting the original data onto the selected
eigenvectors.

Applications of PCA:

● Dimensionality Reduction: PCA is widely used to reduce the number of features in

high-dimensional data, which can improve the performance of machine learning
algorithms.
● Data Visualization: By reducing the dimensions to 2 or 3, PCA helps visualize
high-dimensional data (e.g., in 2D or 3D scatter plots).
● Noise Reduction
● Feature Engineering

Advantages of PCA:

● Improved Efficiency: Reduces the number of features, leading to faster training of

machine learning models.
● Noise Reduction: Eliminates less important features that could introduce noise.
● Interpretability: Helps in understanding the structure of the data by identifying the
directions of maximum variance.

Disadvantages of PCA:

● Loss of Information: Some information may be lost if too many principal

components are discarded.
● Linear Assumption: PCA assumes that the data lies on a linear subspace, which
may not always be true.
● Hard to Interpret: Principal components are combinations of original features, which
can be difficult to interpret.

Ex1 :https://www.youtube.com/watch?v=ZtS6sQUAh0c

Ex2: https://www.youtube.com/watch?v=XO0US1aTA50

Kernel Principal Component Analysis (Kernel PCA) is an extension of PCA that allows
non-linear dimensionality reduction by mapping data into a higher-dimensional space using
kernel functions (like RBF, polynomial) before applying PCA. It is useful for data with
non-linear relationships, where traditional PCA fails.
Steps in Kernel PCA:

1. Kernel Mapping: Use a kernel function (e.g., RBF) to map the data to a
higher-dimensional space without explicitly transforming the data.
2. Compute the Kernel Matrix: Calculate the pairwise kernel values for all data points.
3. Center the Kernel Matrix: Subtract the mean row and column from the kernel
matrix.
4. Eigenvalue Decomposition: Perform eigenvalue decomposition to find the principal
components in the feature space.
5. Projection: Project the data onto the selected principal components to reduce
dimensionality.

Advantages:

● Can capture nonlinear relationships in the data.

● Flexible with different kernel functions.
● Useful for complex data patterns.

Disadvantages:

● Computationally expensive.
● Choice of kernel function can affect performance.
● Results may be hard to interpret in the high-dimensional space.
Applications:

● Non-linear dimensionality reduction.

● Pattern recognition and machine learning preprocessing.
● Visualizing complex, high-dimensional data.

tSNE (t-distributed stochastic neighbour embedding)

t-SNE is a powerful non-linear dimensionality reduction technique primarily used for the
visualization of high-dimensional data in two or three dimensions. It is particularly effective
for visualizing clusters or patterns in complex datasets, such as those arising in machine
learning or bioinformatics.

How t-SNE Works:

● Similarity Measurement (High-dimensional space): t-SNE starts by computing

pairwise similarities between points in the high-dimensional space. It does this
using conditional probabilities: the probability that a data point xj is a neighbor of
point xi, which is modeled using a Gaussian distribution. The similarity between two
points is based on the Euclidean distance between them.

where Pijis the probability of point xj being a neighbor of point xi, and σ is a parameter that
controls the width of the Gaussian.
● Similarity Measurement (Low-dimensional space):t-SNE then tries to embed the
points into a low-dimensional space (typically 2D or 3D), where the similarities are
modeled using a t-distribution with one degree of freedom (Cauchy distribution),
rather than a Gaussian distribution. The use of a t-distribution helps to separate
clusters better and reduces crowding problems.

where Qijrepresents the similarity between points yi and yjin the low-dimensional space.

● Minimizing the Divergence (KL Divergence): The algorithm minimizes the

Kullback-Leibler (KL) divergence between the probability distributions in the
high-dimensional space (from step 1) and the low-dimensional space (from step 2).
The objective is to make the pairwise similarities in the low-dimensional space as
similar as possible to those in the original space.
● Optimization:t-SNE uses gradient descent to iteratively adjust the positions of
points in the low-dimensional space to minimize the KL divergence.

Key Features:

● Preserves Local Structure: Keeps similar points close in the low-dimensional

space.
● Non-linear Mapping: Captures complex, non-linear relationships, unlike PCA.
● Emphasizes Clusters: Effective at visualizing clusters and patterns in
high-dimensional data.

Advantages:

● Effective Visualization: Produces clear 2D/3D visualizations of patterns, clusters,

and outliers.
● Non-linear: Captures non-linear relationships better than linear methods like PCA.
● Preserves Local Structure: Reveals similarities and structures in the data.

Disadvantages:

● Computationally Expensive: Slow and memory-intensive, especially for large

datasets.
● Non-deterministic: Results can vary across runs.
● Distorts Global Structure: Doesn't preserve the overall distances between clusters.
● Hard to Interpret: Limited to 2D/3D visualization, which may not fully capture data
complexity.

Applications: Data Visualization,Clustering

Metrics and Error Correction in Unsupervised Learning

Internal Metrics (No ground truth required):

○ Silhouette Score: Measures how well-separated clusters are. Higher values

indicate better clustering.
○ Davies-Bouldin Index: Measures cluster similarity. Lower values indicate
better clustering.
○ Dunn Index: Measures the compactness and separation of clusters. Higher
values are better.
○ Inertia: Measures the compactness of clusters. Lower inertia is better.

External Metrics (Requires ground truth):

○ Rand Index: Measures similarity between two clusterings. Higher values

indicate better agreement with the ground truth.

Error Correction in Clustering:

● Model Re-Initialization: Re-initializing centroids to avoid local minima.

● Post-Processing: Merging/splitting clusters and removing noise to improve
clustering.
● Feature Selection/Extraction: Reducing dimensionality or selecting relevant
features to reduce noise.
● Consensus Clustering: Combining multiple clustering results to improve stability.
● Post-Clustering Adjustment: Using algorithms like EM for refining clusters.
● Supervised Refinement: Using labeled data to adjust clusters.

DIFFERENTIATE BETWEEN PCA, KERNEL PCA, t-SNE

CRSI Product Catalog 2024-Spring
No ratings yet
CRSI Product Catalog 2024-Spring
20 pages
Module 2-PROPERTIES OF A WELL-WRITTEN
No ratings yet
Module 2-PROPERTIES OF A WELL-WRITTEN
10 pages
Humanities and Social Sciences (Humss) Grade 11 Grade 12: ST Century From The Philippines and The World
83% (6)
Humanities and Social Sciences (Humss) Grade 11 Grade 12: ST Century From The Philippines and The World
1 page
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
Unit 4
No ratings yet
Unit 4
29 pages
Courses For The Jan 2025 NPTEL Courses
No ratings yet
Courses For The Jan 2025 NPTEL Courses
2 pages
Learning Kit para Sa Mga Bulilit
No ratings yet
Learning Kit para Sa Mga Bulilit
3 pages
Bachelor of Paramedicine - Victoria University
No ratings yet
Bachelor of Paramedicine - Victoria University
18 pages
Unit4 ML
No ratings yet
Unit4 ML
20 pages
Clustering Personal
No ratings yet
Clustering Personal
9 pages
Clustering Analysis
No ratings yet
Clustering Analysis
12 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
Title Proposal For Quantitative Research
No ratings yet
Title Proposal For Quantitative Research
3 pages
BSC Sem 3 & 4 (Major-Minor-MDC-SEC) Medical Laboratory Syllabus From 2024-25 (DT 13-05-2024)
No ratings yet
BSC Sem 3 & 4 (Major-Minor-MDC-SEC) Medical Laboratory Syllabus From 2024-25 (DT 13-05-2024)
24 pages
ML Assign4
No ratings yet
ML Assign4
7 pages
Ls Student Parent Handbook SY2021 2022
No ratings yet
Ls Student Parent Handbook SY2021 2022
83 pages
Makesworth Accountants Scholarship For ACCA Students in Nepal 2024
No ratings yet
Makesworth Accountants Scholarship For ACCA Students in Nepal 2024
2 pages
Disadvantages of Learning Management Systems in Teaching and Learning
No ratings yet
Disadvantages of Learning Management Systems in Teaching and Learning
8 pages
Unit 4
No ratings yet
Unit 4
125 pages
Quiz Submissions - Quiz 3.4
No ratings yet
Quiz Submissions - Quiz 3.4
3 pages
National Nutrition Council
100% (1)
National Nutrition Council
1 page
ML Unit III
No ratings yet
ML Unit III
82 pages
EXAMÉN EDUSOFT-Intermediate 1 Exit Test
No ratings yet
EXAMÉN EDUSOFT-Intermediate 1 Exit Test
7 pages
Lecture 13 - Unsupervised Learning, PCA ICA
No ratings yet
Lecture 13 - Unsupervised Learning, PCA ICA
50 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Clustering Classification and Intro Neural Network
No ratings yet
Clustering Classification and Intro Neural Network
168 pages
Chapter 3-Unsupervised Learning - Updated
No ratings yet
Chapter 3-Unsupervised Learning - Updated
54 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
4 - HEMATOMUL DIGITAL SPONTAN PAROXISTIC Ro 369
No ratings yet
4 - HEMATOMUL DIGITAL SPONTAN PAROXISTIC Ro 369
4 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
ML - 8
No ratings yet
ML - 8
70 pages
Unit IV
No ratings yet
Unit IV
51 pages
K Mean Notes
No ratings yet
K Mean Notes
5 pages
Unit 4
No ratings yet
Unit 4
16 pages
Variations of Love by Margaret Atwood
No ratings yet
Variations of Love by Margaret Atwood
3 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
Sisipan Harga BU 2026 - R3 - Share
No ratings yet
Sisipan Harga BU 2026 - R3 - Share
2 pages
Unit IV
No ratings yet
Unit IV
96 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
DM Unit Iv
No ratings yet
DM Unit Iv
45 pages
Algo
No ratings yet
Algo
59 pages
Week 9
No ratings yet
Week 9
66 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
88 Embedded-Questions US
No ratings yet
88 Embedded-Questions US
18 pages
K Means
No ratings yet
K Means
25 pages
Pilot
No ratings yet
Pilot
3 pages
M5
No ratings yet
M5
40 pages
M5
No ratings yet
M5
40 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
Clustering
No ratings yet
Clustering
75 pages
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
School Story
No ratings yet
School Story
18 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
White Classic Clean Resume
No ratings yet
White Classic Clean Resume
2 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Antonio Young Resume
No ratings yet
Antonio Young Resume
2 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Clustering
No ratings yet
Clustering
10 pages
Cs2402 Mobile Computing: Unit Ii
No ratings yet
Cs2402 Mobile Computing: Unit Ii
6 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Survey Questionnaire: Statement Always Sometimes Often Never
No ratings yet
Survey Questionnaire: Statement Always Sometimes Often Never
4 pages
Health-Optimizing P.E. (H.O.P.E.) 2: Sports: Organizing A Sports Events
No ratings yet
Health-Optimizing P.E. (H.O.P.E.) 2: Sports: Organizing A Sports Events
6 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Theory of Mind
No ratings yet
Theory of Mind
7 pages
Field Experience B-Principal Interview
No ratings yet
Field Experience B-Principal Interview
6 pages
A1 Movers 2018 Speaking Part 4
75% (4)
A1 Movers 2018 Speaking Part 4
4 pages
Detailed Lesson Plan in Trends, Network & Critical Thinking (2 Quarter)
100% (4)
Detailed Lesson Plan in Trends, Network & Critical Thinking (2 Quarter)
5 pages
Standard Based Curriculum
100% (1)
Standard Based Curriculum
13 pages
Kami Export - Gene Expression-Translation-S.1617553074
89% (9)
Kami Export - Gene Expression-Translation-S.1617553074
6 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet

Unit 4

Uploaded by

Unit 4

Uploaded by

UNIT – IV: Unsupervised Learning

Clustering basics (Partitioned, Hierarchical and Density based) -Means clustering – K-

● K-means clustering is the most popular method under partitioned clustering.

Key Features: Fast and efficient but sensitive to outliers.

1. Initialize K medoids (actual data points).

Key Features: More robust to outliers than K-means.

1. Initialize K modes (categorical prototypes).

Key Features: Designed for clustering categorical data effectively.

There are two types of hierarchical clustering:

● Agglomerative (Bottom-up approach):

● DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is the

K-means Clustering Algorithm

Steps in the K-means Algorithm

● X={x1,x2,…,xn} be the set of data points.

1. Determines Cluster Count:

● Simple and efficient for large datasets.

● Sensitive to the choice of initial centroids.

The Elbow Method helps find the optimal number of clusters

by analyzing the Within-Cluster Sum of Squares (WCSS)

The plot forms an elbow-like shape, and the point where

the curve bends is the optimal number of clusters.

Pros:Simple to understand and apply.

Cons:The elbow point is sometimes subjective and not always clear.

Pros:Provides a clear measure of cluster quality.

Cons:More computationally intensive than the Elbow Method.

1. Neighborhood (ε): Objects within a radius ε of a point.

Steps in DBSCAN Algorithm:

1. Select Random Object

Advantages:Can detect clusters of arbitrary shape,Automatically identifies noise

Disadvantages:Sensitive to the choice of ε and minPts.Not suitable for datasets with

● Effective for visualizing high-dimensional data.

● Sensitive to initialization and parameters (like learning rate).

The Expectation-Maximization (EM) algorithm is a powerful statistical method used for

Steps in the EM Algorithm:

Applications of the EM Algorithm:

Advantages of the EM Algorithm:

● Can handle models with hidden variables.

various probabilistic models.

● Often finds a good solution even with complex, high-dimensional data.

Disadvantages of the EM Algorithm:

Step 1: Expectation Step (E Step)

Step 2: Maximization Step (M Step)

○ Can form elongated or irregularly shaped clusters.

Advantages:Tends to produce compact, spherical clusters.,Less sensitive to noise and

Disadvantages:May be less flexible in terms of handling clusters of different shapes.,Can

Principal Component Analysis

Principal Component Analysis (PCA) is a dimensionality reduction technique used to

PCA (Principal Component Analysis) works as a dimensionality reduction technique by

How PCA Works:

1. Standardize the Data (Optional but recommended):

3. Compute the eigenvalues λ and eigenvectors v of the covariance matrix.

● Dimensionality Reduction: PCA is widely used to reduce the number of features in

● Improved Efficiency: Reduces the number of features, leading to faster training of

● Loss of Information: Some information may be lost if too many principal

● Can capture nonlinear relationships in the data.

● Non-linear dimensionality reduction.

tSNE (t-distributed stochastic neighbour embedding)

How t-SNE Works:

● Similarity Measurement (High-dimensional space): t-SNE starts by computing

● Minimizing the Divergence (KL Divergence): The algorithm minimizes the

● Preserves Local Structure: Keeps similar points close in the low-dimensional

● Effective Visualization: Produces clear 2D/3D visualizations of patterns, clusters,

● Computationally Expensive: Slow and memory-intensive, especially for large

Applications: Data Visualization,Clustering

Metrics and Error Correction in Unsupervised Learning

Internal Metrics (No ground truth required):

○ Silhouette Score: Measures how well-separated clusters are. Higher values

External Metrics (Requires ground truth):

○ Rand Index: Measures similarity between two clusterings. Higher values

Error Correction in Clustering:

● Model Re-Initialization: Re-initializing centroids to avoid local minima.

DIFFERENTIATE BETWEEN PCA, KERNEL PCA, t-SNE

You might also like