0% found this document useful (0 votes)

12 views19 pages

Unit Iv

Uploaded by

apdeshmukh371122

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views19 pages

Unit Iv

Uploaded by

apdeshmukh371122

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

UNIT IV UnSupervised Learning

The task of grouping data points based on their similarity with each other is
called Clustering or Cluster Analysis.

This method is defined under the branch of Unsupervised Learning, which aims
at gaining insights from unlabelled data points, that is, unlike supervised
learning we don’t have a target variable.

Clustering aims at forming groups of homogeneous data points from a

heterogeneous dataset. It evaluates the similarity based on a metric like
Euclidean distance, Cosine similarity, Manhattan distance, etc. and then group
the points with highest similarity score together.

For Example, In the graph given below, we can clearly see that there are 3
circular clusters forming on the basis of distance.

or example, In the below given graph we can see that the clusters formed are
not circular in shape
Types of Clustering

Broadly speaking, there are 2 types of clustering that can be performed to

group similar data points:

Hard Clustering: In this type of clustering, each data point belongs to a cluster
completely or not. For example, Let’s say there are 4 data point and we have
to cluster them into 2 clusters. So each data point will either belong to cluster
1 or cluster 2.

Data Points Clusters

A C1

B C2

C C2

D C1

Soft Clustering: In this type of clustering, instead of assigning each data point
into a separate cluster, a probability or likelihood of that point being that
cluster is evaluated. For example, Let’s say there are 4 data point and we have
to cluster them into 2 clusters. So we will be evaluating a probability of a data
point belonging to both clusters. This probability is calculated for all data
points.

Data Points Probability of C1 Probability of C2

A 0.91 0.09

B 0.3 0.7

C 0.17 0.83

D 1 0

K means Clustering

What is K-means Clustering?

K-Means Clustering is an Unsupervised Machine Learning algorithm, which

groups the unlabeled dataset into different clusters.
Calculate new c1 as 2+2+4/3=2.66 and 4+6+7/3=5.66 like this calculate new c2
and c3
Cluster the following eight points (with (x, y) representing locations) into three
clusters:

A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)

After second iteration, the center of the three clusters are-

C1(3, 9.5)

C2(6.5, 5.25) and C3(1.5, 3.5)

K-medoids

K-medoids clustering is a partitioning technique similar to k-means, but with

some key differences that make it more robust to noise and outliers. Here’s a
brief overview:

Key Features of K-Medoids Clustering:

Medoids as Centers: Unlike k-means, which uses the mean of the points in a
cluster as the center, k-medoids selects actual data points as the centers
(medoids). This makes the cluster centers more interpretable.

Robustness: K-medoids minimizes a sum of pairwise dissimilarities instead of

squared Euclidean distances, making it less sensitive to outliers and noise1.

Dissimilarity Measures: It can use arbitrary dissimilarity measures, whereas k-

means generally requires Euclidean distance
Algorithm Steps:

1. Choose k number of random points from the data and assign these k
points to k number of clusters. These are the initial medoids.
2. For all the remaining data points, calculate the distance from each
medoid and assign it to the cluster with the nearest medoid.
3. Calculate the total cost (Sum of all the distances from all the data points
to the medoids)
4. Select a random point as the new medoid and swap it with the previous
medoid. Repeat 2 and 3 steps.
5. If the total cost of the new medoid is less than that of the previous
medoid, make the new medoid permanent and repeat step 4.
6. If the total cost of the new medoid is greater than the cost of the
previous medoid, undo the swap and repeat step 4.
7. The Repetitions have to continue until no change is encountered with
new medoids to classify data points.
{Cost(3,4),(2,6)}=|3-4|+|4-6|=3

Total cost=3+4+4+3+1+1+2+2=20
Example

Take new random non medoid (8,4)

Hierarchical clustering
Hierarchical clustering is a method of cluster analysis in machine learning and
statistics that builds a hierarchy of clusters. It is particularly useful for
discovering the underlying structure in data.

Key Features of Hierarchical Clustering

Types:

Agglomerative (Bottom-Up): Starts with each data point as its own cluster and
merges the closest pairs of clusters iteratively until all points are in a single
cluster or a stopping criterion is met.

Steps:

Consider each alphabet as a single cluster and calculate the distance of one
cluster from all the other clusters.

In the second step, comparable clusters are merged together to form a single
cluster. Let’s say cluster (B) and cluster (C) are very similar to each other
therefore we merge them in the second step similarly to cluster (D) and (E) and
at last, we get the clusters [(A), (BC), (DE), (F)]
We recalculate the proximity (it find similarities’ and dissimilarities) according
to the algorithm and merge the two nearest clusters ([(DE), (F)]) together to
form new clusters as [(A), (BC), (DEF)]

Repeating the same process; The clusters DEF and BC are comparable and
merged together to form a new cluster. We’re now left with clusters [(A),
(BCDEF)].

At last, the two remaining clusters are merged together to form a single cluster
[(ABCDEF)].

Divisive (Top-Down): Begins with all data points in one cluster and recursively
splits them into smaller clusters.

Dendrogram: The results are often visualized using a dendrogram, a tree-like

diagram that shows the arrangement of the clusters formed at each step.

Distance Metrics: Various distance metrics (e.g., Euclidean, Manhattan) and

linkage criteria (e.g., single, complete, average) can be used to determine the
similarity between clusters.

Applications: Hierarchical clustering is used in various fields such as

bioinformatics, image analysis, and market research to group similar items
together and understand the relationships between them.

Advantages and Disadvantages

Advantages:

Does not require the number of clusters to be specified in advance.

Can capture complex cluster structures.

Provides a clear visual representation of the clustering process through

dendrograms.

Disadvantages:

Computationally intensive, especially for large datasets.

Sensitive to noise and outliers.

The choice of distance metric and linkage method can significantly affect the
results.

Multi-view clustering

Multi-view clustering is an exciting area of unsupervised learning that aims to

group unlabeled data points by leveraging multiple views or feature sets of the
data

Here are some key points about it:

Definition: Multi-view clustering involves using different "views" or feature

sets of the same data to improve clustering performance2

Each view might contain different information about the data points, and
combining these views can lead to more accurate and robust clustering
results

Challenges: One of the main challenges is how to effectively integrate and align
these different views, especially when they have varying levels of noise and
completeness

Another challenge is balancing view consistency (ensuring the views agree with
each other) and view specificity (capturing unique information from each view)

Methods: Various methods have been proposed to tackle these challenges,

including graph-based approaches, contrastive learning, and deep learning
techniques

For example, some methods use graph learning to capture the relationships
between data points across different views, while others use contrastive
learning to align representations from different views

Applications: Multi-view clustering is used in various fields such as image and

video analysis, text mining, and bioinformatics, where data can naturally be
represented in multiple views

Data Structures-Trees
100% (3)
Data Structures-Trees
40 pages
Cluster
100% (1)
Cluster
72 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
Lecture 3 Types of Machine Learning
No ratings yet
Lecture 3 Types of Machine Learning
40 pages
Module 5
No ratings yet
Module 5
98 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
Clustering
No ratings yet
Clustering
7 pages
Clustering
No ratings yet
Clustering
39 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
B+ Tree & B Tree
No ratings yet
B+ Tree & B Tree
38 pages
Unit-7 Finalized
No ratings yet
Unit-7 Finalized
20 pages
Lab 5: 16 April 2012 Exercises On Neural Networks
No ratings yet
Lab 5: 16 April 2012 Exercises On Neural Networks
6 pages
Clustering
No ratings yet
Clustering
125 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Power Method and Dominant Eigenvalues
No ratings yet
Power Method and Dominant Eigenvalues
2 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Clustering
No ratings yet
Clustering
75 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Group 6 - Laboratory Activity 2 - Bisection Method
No ratings yet
Group 6 - Laboratory Activity 2 - Bisection Method
11 pages
Unit 4
No ratings yet
Unit 4
74 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Unit 5
No ratings yet
Unit 5
5 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Lecture 08 GA
No ratings yet
Lecture 08 GA
65 pages
Consistent Hashing
No ratings yet
Consistent Hashing
2 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Ald Assignment 5 AMAN AGGARWAL (2018327) : Divide and Conquer
No ratings yet
Ald Assignment 5 AMAN AGGARWAL (2018327) : Divide and Conquer
12 pages
Simplex Method
No ratings yet
Simplex Method
5 pages
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
Data Structure File MPCT
No ratings yet
Data Structure File MPCT
37 pages
Chapter - 3 - Searching and Sorting Algorithms
No ratings yet
Chapter - 3 - Searching and Sorting Algorithms
21 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
S.Y Syllabus
No ratings yet
S.Y Syllabus
57 pages
Clustering (Class 38-39)
No ratings yet
Clustering (Class 38-39)
45 pages
Bisection Method DEFINITION - Bisection Method Is The Simplest Among All The Numerical Schemes To Solve The
No ratings yet
Bisection Method DEFINITION - Bisection Method Is The Simplest Among All The Numerical Schemes To Solve The
3 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
M5
No ratings yet
M5
40 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
Design and Analysis of Algorithm Questions and Answers
No ratings yet
Design and Analysis of Algorithm Questions and Answers
81 pages
M5
No ratings yet
M5
40 pages
Clustering
No ratings yet
Clustering
69 pages
Wipro Elite NLTH Coding Programming Questions
No ratings yet
Wipro Elite NLTH Coding Programming Questions
91 pages
Unit 4 Self Made
No ratings yet
Unit 4 Self Made
28 pages
Clustering Notes
No ratings yet
Clustering Notes
37 pages
Unit-4 ML
No ratings yet
Unit-4 ML
16 pages
CBNST Practical Programs
No ratings yet
CBNST Practical Programs
11 pages
National - I YEAR C New Program
No ratings yet
National - I YEAR C New Program
28 pages
Analysis of Algorithms II: The Recursive Case
No ratings yet
Analysis of Algorithms II: The Recursive Case
28 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
CMSC 451: Reductions & NP-completeness: Slides By: Carl Kingsford
No ratings yet
CMSC 451: Reductions & NP-completeness: Slides By: Carl Kingsford
22 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Clustering New
No ratings yet
Clustering New
6 pages
Clustering
No ratings yet
Clustering
38 pages
3 Uninformed Search
No ratings yet
3 Uninformed Search
77 pages
Clustering
No ratings yet
Clustering
20 pages
A Novel Meta-Heuristic Algorithm For Solving Numerical Optimization Problems: Ali Baba and The Forty Thieves
No ratings yet
A Novel Meta-Heuristic Algorithm For Solving Numerical Optimization Problems: Ali Baba and The Forty Thieves
47 pages
Fuzzy Meaning
No ratings yet
Fuzzy Meaning
6 pages
Midterm Lec Exam
No ratings yet
Midterm Lec Exam
14 pages
SV Constraints 240529 181113
No ratings yet
SV Constraints 240529 181113
27 pages
Algo Handout
No ratings yet
Algo Handout
23 pages
Lab Manual-Oops FINAL
No ratings yet
Lab Manual-Oops FINAL
57 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Clustering
No ratings yet
Clustering
80 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Department of Artificial Intelligence & Data Science K. K. Wagh Institute of Engineering Education and Research
No ratings yet
Department of Artificial Intelligence & Data Science K. K. Wagh Institute of Engineering Education and Research
5 pages
Unsupervised Machine Learning Techniques
No ratings yet
Unsupervised Machine Learning Techniques
58 pages
Design Analysis Algorithm Aug Sep2023
No ratings yet
Design Analysis Algorithm Aug Sep2023
2 pages
DAAL - Assignment No 7
No ratings yet
DAAL - Assignment No 7
2 pages
Unit 4 Mining
No ratings yet
Unit 4 Mining
12 pages
BCA - Data Structures
No ratings yet
BCA - Data Structures
33 pages
Week 10
No ratings yet
Week 10
84 pages
DAAL - Assignment No 8
No ratings yet
DAAL - Assignment No 8
1 page
Cencini al2012.WhatEcologFactorShapeSpAreaCurveNeutralModel s2
No ratings yet
Cencini al2012.WhatEcologFactorShapeSpAreaCurveNeutralModel s2
2 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Algo
No ratings yet
Algo
4 pages
Ty Ai&Ds Semester: Assignment No.: Assignment Title
No ratings yet
Ty Ai&Ds Semester: Assignment No.: Assignment Title
4 pages
DAAL - Assignment No 10
No ratings yet
DAAL - Assignment No 10
2 pages
DAAL - Assignment No 9
No ratings yet
DAAL - Assignment No 9
1 page
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
23it1332-Dsa Assignment
No ratings yet
23it1332-Dsa Assignment
5 pages
Module 5
No ratings yet
Module 5
43 pages
Unit 4
No ratings yet
Unit 4
16 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
79 pages
L-8 CS-410 Data Structure and Algorithms 3 (2-1)
No ratings yet
L-8 CS-410 Data Structure and Algorithms 3 (2-1)
26 pages