0% found this document useful (0 votes)

52 views40 pages

Lecture 3 Types of Machine Learning

This document discusses unsupervised learning algorithms. It begins by defining unsupervised learning as involving unlabeled data where the desired results are unknown. Common unsupervised learning techniques include clustering, dimensionality reduction, and association rule mining. Clustering is described as grouping similar data objects into clusters while separating dissimilar objects. Popular clustering algorithms mentioned are k-means clustering, t-SNE, PCA, and association rules. Unsupervised learning is used for tasks like anomaly detection, extracting patterns from data, and gaining insights without labeled examples.

Uploaded by

Saikat Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views40 pages

Lecture 3 Types of Machine Learning

Uploaded by

Saikat Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

APEX INSTITUTE OF TECHNOLOGY

COMPUTER SCIENCE &

ENGINEERING
Bachelor of Engineering (Computer Science)
Types of Learning Algorithm
20CSF-286
Prof. (Dr.) Paras Chawla (E5653)

Unit 1 : Machine Learning DISCOVER . LEARN . EMPOWER

Unsupervised Learning
• Unsupervised Learning is that does not involve direct control of the developer. If the main point of
supervised machine learning is that you know the results and need to sort out the data, then in the case of
unsupervised machine learning algorithms the desired results are unknown and yet to be defined. Ex:
Recommender Systems, buying habits.
• Another big difference between the two is that supervised learning uses labeled data exclusively, while
unsupervised learning feeds on unlabeled data.

The unsupervised machine learning algorithm is used for:

• exploring the structure of the information;
• extracting valuable insights;
• detecting patterns;

In other words, unsupervised machine learning

describes information by sifting through it and
making sense of it.
The most widely used algorithms are:
• k-means clustering
• t-SNE (t-Distributed Stochastic Neighbor Embedding)
• PCA (Principal Component Analysis)
• Association rule
Unsupervised Learning

Unsupervised learning algorithms apply the following techniques to describe the data:
• Clustering: it is an exploration of data used to segment it into meaningful groups (i.e., clusters)
based on their internal patterns without prior knowledge of group credentials.
• The credentials are defined by the similarity of individual data objects and also aspects of their
dissimilarity from the rest (which can also be used to detect anomalies).
• Dimensionality reduction: there is a lot of noise in the incoming data. Machine learning algorithms
use dimensionality reduction to remove this noise while distilling the relevant information.
Clustering
• Clustering is a technique for finding similarity groups in data, called clusters. i.e., groups data instances
that are similar to (near) each other in one cluster and data instances that are very different (far away) from
each other into different clusters.
• Clustering is often called an unsupervised learning task, as no class values denoting an a priori grouping
of the data instances are given, which is the case in supervised learning.
• It is a main task of exploratory data mining , and a common technique for statistical data analysis . Used in
many fields, including machine learning, pattern recognition, image analysis, etc.
• For example, a taxi agent might gradually develop a concept of “good traffic days” and “bad traffic days”
without ever being given labeled examples of each by a teacher.

4
Unsupervised Learning Process Flow
The data has no labels. The machine just looks for whatever patterns it can find.

Unsupervised Learning Model

Feature
Training
Text,
Documents
Vectors
, Images,
etc.

Machine
Learning
Algorithm

Feature
Likelihood
New Text, Vectors or Cluster ID
Document, Predictive or Better
Images, etc. Model Representatio
n
Unsupervised Learning vs. Supervised Learning
The only difference is the labels in the training data

Feature Feature
Training Text, Training Text,
Vectors Documents, Vectors
Documents,
Images, etc. Images, etc.

Machine
Machine
Learning Labels Learning
Algorithm
Algorithm

Feature
Likelihood Feature
New Text, or Cluster ID
Vectors Predictive New Text,
Document, or Better Expected
Model Vectors Predictive
Images, etc. Representation Document,
Model Label
Images, etc.
Unsupervised Learning: Example
Clustering like-looking birds/animals based on their features

Unsupervised
Learning
Application of Unsupervised Learning
Unsupervised learning can be used for anomaly detection as well as
clustering
Anomaly
0.9 detection

7 0.8
6
0.7
5 + xxx
x xx
0.6 + +++ x
+ x x x x x xx
4 +++++ x
xxx 0.0033
+++ 0.0251
3 0.5 + 0.008
2 0.0119
0.4
1
0.3
0
0.2
-1
-2 0.1
0 2 4 6 8 10 0 0.5 0.6 0.7 0.8 0.9 0.1
0.1 0.2 0.3 0.4
Identifying
similarities in
groups
(Clustering)
Clustering
Grouping objects based on the information found in data that describes the
objects or their relationship

The goal is to see that

similar objects are
grouped into one
cluster and different
from objects in
another cluster

Cluster 0
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Need of Clustering
To determine the intrinsic grouping in a set of unlabeled data

To organize data into clusters showing internal structure of the data

To partition the data points

To understand and extract value from large sets of structured and unstructured data
Types of Clustering
Clustering

Hierarchical Partitional
clustering
clustering

Agglomerative Divisive K-means Fuzzy C-means

Hierarchical Clustering
Outputs a hierarchy, a structure that is more informative than the unstructured set of clusters returned by flat clustering

B C B C B C B C
A A A A

D D D D
F E F E F E F E
Dissimilarity

Dissimilarity

Dissimilarity
A
A B C D E B C D E F A B C D E F
A B C D E
F
Combine A and B based on similarity Combination of A and B is combined Final tree contains all clusters
Combine D and E based on with C Combined into a single cluster
Combination of D and E is combined
similarity
2 with F 3
1 4
Working: Hierarchical Clustering

Step 1 Step 2 Step 3 Step 4

Assign each item to Find the closest (most Compute distances Repeat steps 2 and 3
its own cluster, such similar) pair of (similarities) between until all items are
that if you have N clusters and merge the new cluster and clustered into a single
number of items, you them into a single every old cluster cluster of size N
now have N number cluster. Now you have
of clusters one less cluster
Distance Measures
Complete - Linkage clustering
• Find the maximum possible distance between points belonging to two different
clusters

Single - Linkage Clustering

• Find the minimum possible distance between points belonging to
two different
clusters

Mean - Linkage Clustering

• Find all possible pair-wise distances for points belonging to
two different clusters and then
calculate the average

Centroid - Linkage Clustering

• Find the centroids of each cluster and calculate
the distance between them
The Dendrogram
Dendrogram ((in Greek, dendro means tree and gramma means drawing) is a tree diagram
frequently used to illustrate the arrangement of the clusters produced by hierarchical
clustering.

Agglomerative

Divisive
Hierarchical Clustering: Example
A hierarchical clustering of distances between cities in kilometers

MI
TO

NA BA
RM
BA NA RM FI TO MI
Hierarchical Clustering: Step 1
Create distance matrix of data

BA FI MI NA RM TO

BA 0 662 877 255 412 996

FI 662 0 295 468 268 400

MI 877 295 0 754 564 138

Distance between TO and MI
NA 255 468 754 0 219 869

RM 412 268 564 219 0 669

TO 996 400 138 869 669 0

Distance Matrix
Hierarchical Clustering: Step 2
From the distance matrix, you can see that MI has least distance with TO and they form a cluster together

BA FI MI NA RM TO
BA FI MI/TO NA RM
BA 0 662 877 255 412 996
BA 0 662 877 255 412
FI 662 0 295 468 268 400
FI 662 0 295 468 268
MI 877 295 0 754 564 138
MI/TO 877 295 0 754 564
NA 255 468 754 0 219 869
NA 255 468 754 0 219
RM 412 268 564 219 0 669
RM 412 268 564 219 0
TO 996 400 138 869 669 0

TO MI

As the MI column has lower values than TO column,

MI/TO consists of MI column values
Hierarchical Clustering: Step 3
Repeat clustering until a single cluster is obtained with all the members in it

BA FI MI/TO NA RM
BA 0 662 877 255 412
FI 662 0 295 468 268
MI/TO 877 295 0 754 564 NA RM TO MI
NA 255 468 754 0 219
RM 412 268 564 219 0
BA FI MI/TO NA/RM
BA 0 662 877 255
FI 662 0 295 268
MI/TO 877 295 0 564
NA/RM 255 468 564 0
Hierarchical Clustering: Step 3 (Contd.)

BA FI MI/TO NA/RM
BA 0 662 877 255
FI 662 0 295 268
MI/TO 877 295 0 564 NA RM TO MI
NA/RM 255 268 564 0

BA/(NA/RM) FI MI/TO
BA/(NA/RM) 0 268 564

FI 268 0 295
MI/TO 564 295 0
Hierarchical Clustering: Step 3 (Contd.)

BA/(NA/RM) FI MI/TO
BA/(NA/RM) 0 268 564

FI 268 0 295 BA
MI/TO 564 295 0 NA RM FI MI
TO

BA/(NA/RM)/FI (MI/TO)
BA/(NA/RM)/FI 0 295

(MI/TO) 295 0
Hierarchical Clustering: Step 4
Derive the final dendrogram

BA/(NA/RM)/FI (MI/TO)
BA/(NA/RM)/FI 0 295
BA
(MI/TO) 295 0 NA RM FI TO
MI
K-means Algorithm: Steps

1 Randomly chooses k datapoints as initial centroids

2 Assigns each datapoint closest to the centroid

3 Calculates new cluster centroids

4 Checks if the convergence criterion is met

K-means: Example
Consider the below datapoints
K-means: Example (Contd.)
Initialize centers randomly
K-means: Example (Contd.)
Assign points to the nearest center
K-means: Example (Contd.)
Readjust centers
K-means: Example (Contd.)
Assign points to the nearest center
K-means: Example (Contd.)
Re-adjust centres
K-means: Example (Contd.)
Assign points to the nearest center
K-means: Example (Contd.)
Readjust centers
K-means: Example (Contd.)
Assign points to the nearest center
Optimal Number of Clusters

Objective Function Value

If you plot k against the SSE, you will see that the error
decreases as k increases

i.e., Distortion
This is because their size decreases and hence distortion is
also smaller"

The goal of elbow method is to choose k where SSE

decreases abruptly Elbow Plot
Knowledge Check
Knowledge
Check Can decision trees be used for performing clustering?
1

a. Tru
e
b. False
Knowledge
Check Can decision trees be used for performing clustering?
1

a. Tru
e
b. False

The correct answer is a. True

Decision trees can also be used to for clusters in the data, but it often generates natural clusters and is not dependent on any
objective function.
Knowledge Which of the following can act as possible termination condition in K-Means?
Check 1. Fixed number of iterations.
2. Assigning observations to clusters such that they don’t change between iterations, except for cases with a bad local minimum.
2 3. Stationary centroids appear between successive iterations.
4. When RSS falls below a threshold.

a. 1,3, and 4

b. 1, 2, and 3

c. 1, 2, and 4

d. All the above

Which of the following can act as possible termination condition in K-Means?
Knowledge
Check 1. Fixed number of iterations.
2. Assigning observations to clusters such that they don’t change between iterations, except for cases with a bad local minimum.
2 3. Stationary centroids appear between successive iterations.
4. When RSS falls below a threshold.

a. 1,3, and 4

b. 1, 2, and 3

c. 1, 2, and 4

d. All the above

The correct answer is d. All the above

All the above options are true.
References

Web Sources:

1. https://machinelearningmastery.com/linear-regression-for-machine-learning/

Video Source:

1. https://www.youtube.com/watch?v=zPG4NjIkCjc
THANK
YOU
For Queries,
Write at : [email protected]

Machine Learning (BTCOC603_Y23) Supplementary December 2024
No ratings yet
Machine Learning (BTCOC603_Y23) Supplementary December 2024
4 pages
ML unit 4
No ratings yet
ML unit 4
110 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
UNIT-4
No ratings yet
UNIT-4
62 pages
Automated Marketing Research Using Online Customer Reviews
No ratings yet
Automated Marketing Research Using Online Customer Reviews
62 pages
Image Processing 7
No ratings yet
Image Processing 7
193 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Unit-4
No ratings yet
Unit-4
53 pages
unit4
No ratings yet
unit4
96 pages
Unit 2 Unsupervised Learning
No ratings yet
Unit 2 Unsupervised Learning
86 pages
Unsupervised Machine Learning Techniques (2)
No ratings yet
Unsupervised Machine Learning Techniques (2)
58 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
63 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
Clustering
No ratings yet
Clustering
44 pages
Copy of How to Perform Clustering Algorithms in Machine Learning
No ratings yet
Copy of How to Perform Clustering Algorithms in Machine Learning
9 pages
ML Lab Manual (final) dtu
No ratings yet
ML Lab Manual (final) dtu
52 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
1
No ratings yet
1
59 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
9 pages
Finding Meaningful Groups of Customer in Data I - Clustering Model
No ratings yet
Finding Meaningful Groups of Customer in Data I - Clustering Model
55 pages
R20 machine learning unit 4
No ratings yet
R20 machine learning unit 4
49 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
2nd Unit NN Final Class Notes (1)
No ratings yet
2nd Unit NN Final Class Notes (1)
50 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
Clustering Notes
No ratings yet
Clustering Notes
37 pages
FALLSEM2024-25_BCSE331L_TH_VL2024250101742_CAT-2-QP-_-KEY
No ratings yet
FALLSEM2024-25_BCSE331L_TH_VL2024250101742_CAT-2-QP-_-KEY
5 pages
Lab 10 Unsupervised
No ratings yet
Lab 10 Unsupervised
12 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
chapter 3 p4
No ratings yet
chapter 3 p4
18 pages
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
No ratings yet
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
95 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Clustering
No ratings yet
Clustering
20 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
Module-2 Part-1 - Merged
No ratings yet
Module-2 Part-1 - Merged
66 pages
UNIT IV
No ratings yet
UNIT IV
19 pages
final report saksham
No ratings yet
final report saksham
20 pages
Unit 4
No ratings yet
Unit 4
74 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
A Benchmark For Comparison of Cell Tracking Algorithms
No ratings yet
A Benchmark For Comparison of Cell Tracking Algorithms
9 pages
ML-UNIT-5
No ratings yet
ML-UNIT-5
20 pages
Wa0000.
No ratings yet
Wa0000.
26 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
fuzzy meaning
No ratings yet
fuzzy meaning
6 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
UNIT 3 (2marks) TA
No ratings yet
UNIT 3 (2marks) TA
4 pages
Unit 3 unsupervised learning algorith
No ratings yet
Unit 3 unsupervised learning algorith
15 pages
Introduction To Machine Learning-Presentation
No ratings yet
Introduction To Machine Learning-Presentation
28 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
ANL252 SU5 Jul2022
No ratings yet
ANL252 SU5 Jul2022
58 pages
Unit 5
No ratings yet
Unit 5
10 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Sustainability 12 04087 v2
No ratings yet
Sustainability 12 04087 v2
16 pages
Solution HW2
No ratings yet
Solution HW2
6 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
14 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
Unit-5 Clustering (March 16, 24)
No ratings yet
Unit-5 Clustering (March 16, 24)
25 pages
Aiml Prof
No ratings yet
Aiml Prof
8 pages
Coincent Data Analysis Answers
No ratings yet
Coincent Data Analysis Answers
16 pages
Lecture 9 - 10 Naive Generative Analysis
No ratings yet
Lecture 9 - 10 Naive Generative Analysis
54 pages
Machine Learning Tools and Toolkits in The Explora
No ratings yet
Machine Learning Tools and Toolkits in The Explora
7 pages
Lecture3.11-3.13 NP Theory and Non Deterministic Algorithm
No ratings yet
Lecture3.11-3.13 NP Theory and Non Deterministic Algorithm
42 pages
Lesson 6 - Unsupervised Learning
No ratings yet
Lesson 6 - Unsupervised Learning
63 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
SJNanda - Spider and CollidingBodies
No ratings yet
SJNanda - Spider and CollidingBodies
50 pages
Ankur - Shukla - DS - Almabetter - Ankur Shukla
No ratings yet
Ankur - Shukla - DS - Almabetter - Ankur Shukla
1 page
Unsupervised Lec
No ratings yet
Unsupervised Lec
12 pages
Clustering in Machine Learning - Javatpoint
No ratings yet
Clustering in Machine Learning - Javatpoint
10 pages
Unsupervised - Learning Final
No ratings yet
Unsupervised - Learning Final
20 pages
103 Exercises
No ratings yet
103 Exercises
70 pages
A Study On K-Means Clustering in Text Mining Using Python
No ratings yet
A Study On K-Means Clustering in Text Mining Using Python
5 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Lecture 3 Hypothesis Space & Inductive Bias
No ratings yet
Lecture 3 Hypothesis Space & Inductive Bias
29 pages
Week 3 Clustering
No ratings yet
Week 3 Clustering
36 pages
Machine Learning Bloque 4
No ratings yet
Machine Learning Bloque 4
12 pages
Learning To Trade Using Q-Learning
No ratings yet
Learning To Trade Using Q-Learning
18 pages
Final Lab MST of ML
No ratings yet
Final Lab MST of ML
9 pages
R Google Analytics PDF
No ratings yet
R Google Analytics PDF
82 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Scikit-Learn Cheat Sheet
No ratings yet
Scikit-Learn Cheat Sheet
1 page
Worksheet - 3.2 - Java - Saikat Das
No ratings yet
Worksheet - 3.2 - Java - Saikat Das
7 pages
College Recommender System Using Student' Preferences/voting: A System Development With Empirical Study
No ratings yet
College Recommender System Using Student' Preferences/voting: A System Development With Empirical Study
12 pages
Un-Supervised Machine Learning
No ratings yet
Un-Supervised Machine Learning
9 pages
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
No ratings yet
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
8 pages
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

Lecture 3 Types of Machine Learning

Uploaded by

Lecture 3 Types of Machine Learning

Uploaded by

APEX INSTITUTE OF TECHNOLOGY

COMPUTER SCIENCE &

Unit 1 : Machine Learning DISCOVER . LEARN . EMPOWER

The unsupervised machine learning algorithm is used for:

In other words, unsupervised machine learning

Unsupervised Learning Model

The goal is to see that

To organize data into clusters showing internal structure of the data

To partition the data points

Agglomerative Divisive K-means Fuzzy C-means

Step 1 Step 2 Step 3 Step 4

Single - Linkage Clustering

Mean - Linkage Clustering

Centroid - Linkage Clustering

BA 0 662 877 255 412 996

FI 662 0 295 468 268 400

MI 877 295 0 754 564 138

RM 412 268 564 219 0 669

TO 996 400 138 869 669 0

As the MI column has lower values than TO column,

1 Randomly chooses k datapoints as initial centroids

2 Assigns each datapoint closest to the centroid

3 Calculates new cluster centroids

4 Checks if the convergence criterion is met

Objective Function Value

The goal of elbow method is to choose k where SSE

The correct answer is a. True

d. All the above

d. All the above

The correct answer is d. All the above

You might also like