0% found this document useful (0 votes)

3 views33 pages

Lect 2 Supervised and Unsupervised Learning

The document discusses supervised and unsupervised learning, focusing on clustering as an unsupervised learning technique. It explains the K-means clustering algorithm, its strengths and weaknesses, and highlights the importance of clustering in various fields. Additionally, it covers hierarchical clustering methods and their applications.

Uploaded by

Arif Hussain maths classes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views33 pages

Lect 2 Supervised and Unsupervised Learning

Uploaded by

Arif Hussain maths classes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 33

AIAS

Amity Institute of Applied Sciences

STAT669
Supervised learning vs. unsupervised learning

Dr. Niraj Kr. Singh

1
AIAS

• Supervised learning: discover patterns in

the data that relate data attributes with a
target (class) attribute.
– These patterns are then utilized to predict the
values of the target attribute in future data
instances.
• Unsupervised learning: The data have no
target attribute.
– We want to explore the data to find some
intrinsic structures in them.
AIAS

Clustering
• Clustering is a technique for finding similarity groups
in data, called clusters. I.e.,
– it groups data instances that are similar to (near) each other
in one cluster and data instances that are very different (far
away) from each other into different clusters.
• Clustering is often called an unsupervised learning
task as no class values denoting an a priori grouping
of the data instances are given, which is the case in
supervised learning.
• Due to historical reasons, clustering is often
considered synonymous with unsupervised learning.
– In fact, association rule mining is also unsupervised
• This chapter focuses on clustering.
AIAS

An illustration
• The data set has three natural groups of data points,
i.e., 3 natural clusters.
AIAS

What is clustering for?

• Let us see some real-life examples
• Example 1: groups people of similar sizes
together to make “small”, “medium” and
“large” T-Shirts.
– Tailor-made for each person: too expensive
– One-size-fits-all: does not fit all.
• Example 2: In marketing, segment customers
according to their similarities
– To do targeted marketing.
AIAS

• Example 3: Given a collection of text

documents, we want to organize them
according to their content similarities,
– To produce a topic hierarchy
• In fact, clustering is one of the most utilized
data mining techniques.
– It has a long history, and used in almost every field,
e.g., medicine, psychology, botany, sociology,
biology, archeology, marketing, insurance, libraries,
etc.
– In recent years, due to the rapid increase of online
documents, text clustering becomes important.
AIAS

Aspects of clustering
• A clustering algorithm
– Partitional clustering
– Hierarchical clustering
• A distance (similarity, or dissimilarity) function
• Clustering quality
– Inter-clusters distance  maximized
– Intra-clusters distance  minimized
• The quality of a clustering result depends on
the algorithm, the distance function, and the
application.
AIAS

K-means clustering
• K-means is a partitional clustering algorithm
• Let the set of data points (or instances) D be
{x1, x2, …, xn},
where xi = (xi1, xi2, …, xir) is a vector in a real-valued
space X  Rr, and r is the number of attributes
(dimensions) in the data.
• The k-means algorithm partitions the given data
into k clusters.
– Each cluster has a cluster center, called centroid.
– k is specified by the user
AIAS

K-means algorithm

• Given k, the k-means algorithm works as

follows:
1) Randomly choose k data points (seeds) to be
the initial centroids, cluster centers
2) Assign each data point to the closest centroid
3) Re-compute the centroids using the current
cluster memberships.
4) If a convergence criterion is not met, go to 2).
AIAS

K-means algorithm – (cont …)

AIAS

Stopping/convergence criterion
1. no (or minimum) re-assignments of data points to
different clusters,
2. no (or minimum) change of centroids, or
3. minimum decrease in the sum of squared error
(SSE), (1)
k
SSE  
j 1
xC j
dist (x, m j ) 2

– Ci is the jth cluster, mj is the centroid of cluster Cj

(the mean vector of all the data points in Cj), and
dist(x, mj) is the distance between data point x
AIAS

An example

+
+
AIAS

An example (cont …)
AIAS

An example: distance function

AIAS

A disk version of k-means

• K-means can be implemented with data on disk
– In each iteration, it scans the data once.
– as the centroids can be computed incrementally
• It can be used to cluster large datasets that do not
fit in main memory
• We need to control the number of iterations
– In practice, a limited is set (< 50).
• Not the best method. There are other scale-up
algorithms, e.g., BIRCH.
AIAS

A disk version of k-means (cont …)

AIAS

Strengths of k-means
• Strengths:
– Simple: easy to understand and to implement
– Efficient: Time complexity: O(tkn),
where n is the number of data points,
k is the number of clusters, and
t is the number of iterations.
– Since both k and t are small. k-means is considered a
linear algorithm.
• K-means is the most popular clustering algorithm.
• Note that: it terminates at a local optimum if SSE is
used. The global optimum is hard to find due to
complexity.
AIAS

Weaknesses of k-means
• The algorithm is only applicable if the mean is defined.
– For categorical data, k-mode - the centroid is
represented by most frequent values.
• The user needs to specify k.
• The algorithm is sensitive to outliers
– Outliers are data points that are very far away from
other data points.
– Outliers could be errors in the data recording or
some special data points with very different values.
AIAS

Weaknesses of k-means: Problems with outliers

AIAS

Weaknesses of k-means: To deal

with outliers
• One method is to remove some data points in the
clustering process that are much further away from the
centroids than other data points.
– To be safe, we may want to monitor these possible outliers over
a few iterations and then decide to remove them.
• Another method is to perform random sampling. Since in
sampling we only choose a small subset of the data
points, the chance of selecting an outlier is very small.
– Assign the rest of the data points to the clusters by distance or
similarity comparison, or classification
AIAS

Weaknesses of k-means (cont …)

• The algorithm is sensitive to initial seeds.
AIAS

Weaknesses of k-means (cont …)

• If we use different seeds: good results
There are some
methods to help
choose good
seeds
AIAS

• The k-means algorithm is not suitable for discovering

clusters that are not hyper-ellipsoids (or hyper-
spheres).

+
AIAS

K-means summary
• Despite weaknesses, k-means is still the most
popular algorithm due to its simplicity, efficiency
and
– other clustering algorithms have their own lists of
weaknesses.
• No clear evidence that any other clustering
algorithm performs better in general
– although they may be more suitable for some specific
types of data or applications.
• Comparing different clustering algorithms is a
difficult task. No one knows the correct clusters!
AIAS

Common ways to represent clusters

• Use the centroid of each cluster to represent the

cluster.
– compute the radius and
– standard deviation of the cluster to determine its
spread in each dimension

– The centroid representation alone works well if the

clusters are of the hyper-spherical shape.
– If clusters are elongated or are of other shapes,
centroids are not sufficient
AIAS

Using classification model

• All the data points in a
cluster are regarded to
have the same class label,
e.g., the cluster ID.

– run a supervised learning

algorithm on the data to find
a classification model.
AIAS

Use frequent values to represent cluster

• This method is mainly for clustering of

categorical data (e.g., k-modes clustering).
• Main method used in text clustering,
where a small set of frequent words in
each cluster is selected to represent the
cluster.
AIAS

Clusters of arbitrary shapes

• Hyper-elliptical and hyper-spherical
clusters are usually easy to
represent, using their centroid
together with spreads.
• Irregular shape clusters are hard to
represent. They may not be useful
in some applications.
– Using centroids are not suitable (upper
figure) in general
– K-means clusters may be more useful
(lower figure), e.g., for making 2 size
T-shirts.
AIAS

Hierarchical Clustering
• Produce a nested sequence of clusters, a tree,
also called Dendrogram.
AIAS

Types of hierarchical clustering

• Agglomerative (bottom up) clustering: It builds the
dendrogram (tree) from the bottom level, and
– merges the most similar (or nearest) pair of clusters
– stops when all the data points are merged into a single
cluster (i.e., the root cluster).
• Divisive (top down) clustering: It starts with all data
points in one cluster, the root.
– Splits the root into a set of child clusters. Each child cluster
is recursively divided further
– stops when only singleton clusters of individual data points
remain, i.e., each cluster with only a single point
AIAS

Agglomerative clustering
It is more popular then divisive methods.
• At the beginning, each data point forms
a cluster (also called a node).
• Merge nodes/clusters that have the least
distance.
• Go on merging
• Eventually all nodes belong to one
cluster
AIAS

Agglomerative clustering algorithm

AIAS

Confusion matrix

The Agentic Oversight Framework
No ratings yet
The Agentic Oversight Framework
27 pages
Architecture and Learning Process in Neural Network - GeeksforGeeks
No ratings yet
Architecture and Learning Process in Neural Network - GeeksforGeeks
6 pages
Building AI Powered Apps Ebook
No ratings yet
Building AI Powered Apps Ebook
13 pages
Sample Questions For Oracle 1z0 1162 1 Exam by Justice
No ratings yet
Sample Questions For Oracle 1z0 1162 1 Exam by Justice
10 pages
Immediate Access Cognitive Science An Introduction To The Study of Mind 4th Edition Friedenberg Verified PDF Download
No ratings yet
Immediate Access Cognitive Science An Introduction To The Study of Mind 4th Edition Friedenberg Verified PDF Download
403 pages
Description of Relevance For Business Data Science
No ratings yet
Description of Relevance For Business Data Science
3 pages
Lecture 2.1.1 To 2.1.2
No ratings yet
Lecture 2.1.1 To 2.1.2
97 pages
Online Chatbot Ticket System
No ratings yet
Online Chatbot Ticket System
6 pages
Explainable Artificial Intelligence (AI) in Colorectal Cancer Detection. A Systematic Review
No ratings yet
Explainable Artificial Intelligence (AI) in Colorectal Cancer Detection. A Systematic Review
19 pages
CS583 Unsupervised Learning
No ratings yet
CS583 Unsupervised Learning
95 pages
9.54 Class 13: Unsupervised Learning
No ratings yet
9.54 Class 13: Unsupervised Learning
54 pages
Lecture 4.6 Unsupervised-Learning Clustering
No ratings yet
Lecture 4.6 Unsupervised-Learning Clustering
60 pages
Soft Computing
No ratings yet
Soft Computing
108 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
Clustering and K-Means Algorithm
No ratings yet
Clustering and K-Means Algorithm
81 pages
Cybersecurity Analytics in Protecting Satellite Telecommunications Networks A Conceptual Development of Current Trends Challenges and Strategic Responses
No ratings yet
Cybersecurity Analytics in Protecting Satellite Telecommunications Networks A Conceptual Development of Current Trends Challenges and Strategic Responses
13 pages
Meeting 7 Unsupervised Learnign
No ratings yet
Meeting 7 Unsupervised Learnign
95 pages
Clustering
No ratings yet
Clustering
29 pages
A CIOs GUIDE TO AIOps
No ratings yet
A CIOs GUIDE TO AIOps
27 pages
Unit 6 Unsupervised Learning
No ratings yet
Unit 6 Unsupervised Learning
68 pages
CS583 Unsupervised Learning
No ratings yet
CS583 Unsupervised Learning
95 pages
Cluster
No ratings yet
Cluster
50 pages
Challenges and Issues - HS1501 Artificial Intelligence and Society (2310)
No ratings yet
Challenges and Issues - HS1501 Artificial Intelligence and Society (2310)
12 pages
Unit 4
No ratings yet
Unit 4
125 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Clustering Notes
No ratings yet
Clustering Notes
37 pages
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
No ratings yet
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
95 pages
07 Clustering
No ratings yet
07 Clustering
54 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Clustering Analysis
No ratings yet
Clustering Analysis
2 pages
Lang Gragh
No ratings yet
Lang Gragh
14 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
8 pages
Real-Time Driver Drowsiness Detection For Android Application Using Deep Neural Networks Techniques
No ratings yet
Real-Time Driver Drowsiness Detection For Android Application Using Deep Neural Networks Techniques
9 pages
The Future of Cybersecurity - Emerging Trends and Challenges
No ratings yet
The Future of Cybersecurity - Emerging Trends and Challenges
5 pages
AI and Cybersecurity in Modern Databases: Innovative Approaches To Threat Detection and Response
No ratings yet
AI and Cybersecurity in Modern Databases: Innovative Approaches To Threat Detection and Response
14 pages
5341 21214 2 PB
No ratings yet
5341 21214 2 PB
49 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
MWC Barcelona 2025 - GSMA Technology Event Guide
No ratings yet
MWC Barcelona 2025 - GSMA Technology Event Guide
16 pages
Unsupervised Learning Update
No ratings yet
Unsupervised Learning Update
37 pages
Letter of Motivation
No ratings yet
Letter of Motivation
2 pages
Clustering
No ratings yet
Clustering
28 pages
6G Networks
No ratings yet
6G Networks
4 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
158 9 (Clustering)
No ratings yet
158 9 (Clustering)
36 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
Module5 QB 1
No ratings yet
Module5 QB 1
21 pages
Unit 4
No ratings yet
Unit 4
40 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
Clustering
No ratings yet
Clustering
38 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Clustering (Class 38-39)
No ratings yet
Clustering (Class 38-39)
45 pages
Prompt Engineering
No ratings yet
Prompt Engineering
25 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Applied Artificial Intelligence in Modern Warfare and National Se
No ratings yet
Applied Artificial Intelligence in Modern Warfare and National Se
41 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Module 5
No ratings yet
Module 5
98 pages
Tcs Investor Relations Presentation 2q25
No ratings yet
Tcs Investor Relations Presentation 2q25
25 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering
No ratings yet
Clustering
125 pages
K Mean
No ratings yet
K Mean
7 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
The ABC's of Cybersecurity The Perfect Introduction
100% (3)
The ABC's of Cybersecurity The Perfect Introduction
38 pages
Artificial Intelligence in Healthcare - Evaluating Health Care Professionals' Perspectives & Attitudes Towards Artificial Intelligence
No ratings yet
Artificial Intelligence in Healthcare - Evaluating Health Care Professionals' Perspectives & Attitudes Towards Artificial Intelligence
36 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
95 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
RF2 AIW Creation Tutorial
No ratings yet
RF2 AIW Creation Tutorial
11 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
CV Mukesh Singh
No ratings yet
CV Mukesh Singh
3 pages
Iimv ML Ai
No ratings yet
Iimv ML Ai
18 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
CS583 Unsupervised Learning
No ratings yet
CS583 Unsupervised Learning
95 pages
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
From Everand
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet

Lect 2 Supervised and Unsupervised Learning

Uploaded by

Lect 2 Supervised and Unsupervised Learning

Uploaded by

AIAS

Amity Institute of Applied Sciences

Dr. Niraj Kr. Singh

• Supervised learning: discover patterns in

What is clustering for?

• Example 3: Given a collection of text

• Given k, the k-means algorithm works as

K-means algorithm – (cont …)

– Ci is the jth cluster, mj is the centroid of cluster Cj

An example: distance function

A disk version of k-means

A disk version of k-means (cont …)

Weaknesses of k-means: Problems with outliers

Weaknesses of k-means: To deal

Weaknesses of k-means (cont …)

Weaknesses of k-means (cont …)

• The k-means algorithm is not suitable for discovering

Common ways to represent clusters

• Use the centroid of each cluster to represent the

– The centroid representation alone works well if the

Using classification model

– run a supervised learning

Use frequent values to represent cluster

• This method is mainly for clustering of

Clusters of arbitrary shapes

Types of hierarchical clustering

Agglomerative clustering algorithm

You might also like