0% found this document useful (0 votes)

45 views23 pages

Machine Learning: Chapter 2 Clustering

The document discusses machine learning clustering techniques. It covers K-means clustering, including initializing centroids, assigning samples, adjusting centroids, and iterating until convergence. It also discusses choosing the number of clusters K using the elbow method by minimizing distortion score. In addition, it briefly reviews decision trees and random forests, and mentions other clustering methods like hierarchical and spectral clustering.

Uploaded by

Nam Le Hoang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views23 pages

Machine Learning: Chapter 2 Clustering

Uploaded by

Nam Le Hoang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Machine Learning

Chapter 2 Clustering
Dr. Minhhuy Le
EEE, Phenikaa University
Chapter 2: Decision Tree
1. Decision tree review
2. Clustering intuition
3. K-means algorithm
4. Summary
1. Random Forest Review
Example:
f: <Outlook, Temperature, Humidity, Wind> => PlayTennis?

Minhhuy Le, ICSLab, Phenikaa Uni. 3

1. Random Forest Review
Example:

Minhhuy Le, ICSLab, Phenikaa Uni. 4

1. Random Forest Review
ID3 approach: Natural greedy approach to growing a decision tree top-down (from the
root to the leaves by repeatedly replacing an existing leaf with an internal node.).

Algorithm:
• Pick “best” attribute to split at the root based on training data.
• Recurse on children that are impure (e.g, have both Yes and No)

Key question: Which attribute is best?

Minhhuy Le, ICSLab, Phenikaa Uni. 5

1. Random Forest Review
ID3 approach: Select attribute with highest information gain (IG)

Information Gain of A is the expected reduction in entropy of target variable Y for

data sample S, due to sorting on variable A

Minhhuy Le, ICSLab, Phenikaa Uni. 6

1. Random Forest Review
ID3 approach: Select attribute with highest information gain (IG)

Minhhuy Le, ICSLab, Phenikaa Uni. 7

1. Random Forest Review
ID3 approach: Select attribute with highest information gain (IG)

Minhhuy Le, ICSLab, Phenikaa Uni. 8

1. Random Forest Review
ID3 steps:

1. Calculate Entropy of one attribute

2. Calculate Entropy of each feature (IG)
3. Choose largest IG as Root Node
4. Entropy = 0 -> Leaf, ≠0 will be splitting
5. Repeat until all data classified

Hyper parameters

Minhhuy Le, ICSLab, Phenikaa Uni. 9

1. Random Forest Review
Random Forest Steps:

1. Select random samples from a given dataset.

2. Construct a decision tree for each sample and get a

prediction result from each decision tree.

3. Perform a vote for each predicted result.

4. Select the prediction result with the most votes as the

final prediction.

Minhhuy Le, ICSLab, Phenikaa Uni. 10

2. Clustering Intuition
Supervised learning Unsupervised learning

Training set:
Training set:

No label data (y)

Minhhuy Le, ICSLab, Phenikaa Uni. 11

2. Clustering Intuition

Minhhuy Le, ICSLab, Phenikaa Uni. 12

2. Clustering Intuition
Clustering: Finding structure in the data
• By isolating groups of examples that are similar in some well-defined sense
• Unsupervised learning algorithm: only input data, no label information

How many classes is the best? Depend on the measure of similarity (or distance) between the
data points to be clustered

Minhhuy Le, ICSLab, Phenikaa Uni. 13

3. K-means algorithm
Clustering methods:
• Hierarchical clustering methods
• Spectral clustering
• Semi-supervised clustering
• Clustering by dynamics
• Flat clustering methods: k-means clustering
• Etc.

Minhhuy Le, ICSLab, Phenikaa Uni. 14

3. K-means algorithm
Procedure:
1. Pick k arbitrary centroids (cluster means)
2. Assign each sample to its closest centroid
3. Adjust the centroids to be the means of the
examples assigned to them
4. Repeat step 2 until no change

K-means algorithm is guaranteed to converge in

a finite number of iterations

Minhhuy Le, ICSLab, Phenikaa Uni. 15

3. K-means algorithm

Minhhuy Le, ICSLab, Phenikaa Uni. 16

3. K-means algorithm
Procedure illustration:

Minhhuy Le, ICSLab, Phenikaa Uni. 17

3. K-means algorithm
Procedure illustration:

Minhhuy Le, ICSLab, Phenikaa Uni. 18

3. K-means algorithm
Procedure illustration:

Minhhuy Le, ICSLab, Phenikaa Uni. 19

3. K-means algorithm

Minhhuy Le, ICSLab, Phenikaa Uni. 20

3. K-means algorithm
How to choose k?

Could repeat several times to get the best solution

Minhhuy Le, ICSLab, Phenikaa Uni. 21

3. K-means algorithm
How to choose k? “Elbow” method
• As k is large (smaller clusters), J is smaller however the model is easy overfitting
• k should be chosen at the “elbow” where J is not significantly reduced

J: distortion score

Minhhuy Le, ICSLab, Phenikaa Uni. 22

5. Summary

• K-means is a parametric method, where the parameters are the prototypes.

• Inflexible; the decision boundary is linear.
• Fast! The update steps can be parallelized.
• There are several variations on the basic K-means algorithm
1. K-means++ gives a more specific way to initialize clusters
2. K-medoids chooses the centermost datapoint in the cluster as the prototype instead
of the centroid. (The centroid may not correspond to a datapoint.)

Minhhuy Le, ICSLab, Phenikaa Uni. 23

Learning Kernel Classifiers. Theory and Algorithms
100% (2)
Learning Kernel Classifiers. Theory and Algorithms
371 pages
Data Reduction Techniques
No ratings yet
Data Reduction Techniques
41 pages
Applications of Artificial Intelligence in Inventory
No ratings yet
Applications of Artificial Intelligence in Inventory
22 pages
Tybscit Sem Vi Subject: Business Intelligence Sample Questions For Self Practice
0% (1)
Tybscit Sem Vi Subject: Business Intelligence Sample Questions For Self Practice
219 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
SIC - AI - Chapter 1. Introduction To Artificial Intelligence - Rev2.0
No ratings yet
SIC - AI - Chapter 1. Introduction To Artificial Intelligence - Rev2.0
121 pages
Lesson8 Clustering
100% (1)
Lesson8 Clustering
33 pages
Lecture 3. Partitioning-Based Clustering Methods
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
Software Engineering
No ratings yet
Software Engineering
35 pages
IET - Applications of Machine Learning in Wireless Communications
No ratings yet
IET - Applications of Machine Learning in Wireless Communications
492 pages
Meet Your Customers v4.7 Ebook
No ratings yet
Meet Your Customers v4.7 Ebook
197 pages
Dataminging Syllabus
100% (1)
Dataminging Syllabus
3 pages
FINAL
No ratings yet
FINAL
33 pages
Clustering Classification and Intro Neural Network
No ratings yet
Clustering Classification and Intro Neural Network
168 pages
Unit IV
No ratings yet
Unit IV
96 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Unit 4 - 1
No ratings yet
Unit 4 - 1
99 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
5+6 Classification
No ratings yet
5+6 Classification
95 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Unsupesfwafarvised Learning
No ratings yet
Unsupesfwafarvised Learning
49 pages
Engineering Mtech Mca Mba Projects List 2021 2022
No ratings yet
Engineering Mtech Mca Mba Projects List 2021 2022
57 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
2 - K-Mean
No ratings yet
2 - K-Mean
39 pages
Clustering (Class 38-39)
No ratings yet
Clustering (Class 38-39)
45 pages
Chapter 5 - Machine Learning
No ratings yet
Chapter 5 - Machine Learning
114 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
All Algos - of - ML
No ratings yet
All Algos - of - ML
31 pages
Data Mining CS4168 Lecture 5 Basics of Classification 1
No ratings yet
Data Mining CS4168 Lecture 5 Basics of Classification 1
25 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Chapter 3 ML
No ratings yet
Chapter 3 ML
27 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
Clustering
No ratings yet
Clustering
80 pages
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
No ratings yet
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
33 pages
4.1.2. K Means Clustering
No ratings yet
4.1.2. K Means Clustering
38 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Clustering
No ratings yet
Clustering
45 pages
Lecture 05.decision Tree and K Means PDF
No ratings yet
Lecture 05.decision Tree and K Means PDF
38 pages
U02Lecture08 Statistical Machine Learning
No ratings yet
U02Lecture08 Statistical Machine Learning
41 pages
Traduccion de Juan Vicente
No ratings yet
Traduccion de Juan Vicente
31 pages
Overview of Clustering:: UNIT-5
No ratings yet
Overview of Clustering:: UNIT-5
27 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
ML4 ML Algorithms
No ratings yet
ML4 ML Algorithms
123 pages
Smart Health Monitoring and Management U
No ratings yet
Smart Health Monitoring and Management U
21 pages
05 Classification Part1
No ratings yet
05 Classification Part1
35 pages
19.1. Partitioning-Based Clustering Algorithms
No ratings yet
19.1. Partitioning-Based Clustering Algorithms
27 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Spe 209127 Ms
No ratings yet
Spe 209127 Ms
17 pages
Data Mining and Machine Learning
No ratings yet
Data Mining and Machine Learning
48 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
27 pages
K Means
No ratings yet
K Means
18 pages
HCPC Husson Josse
No ratings yet
HCPC Husson Josse
17 pages
Machine Learning in Nutrition Research
No ratings yet
Machine Learning in Nutrition Research
17 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
CE345 - Lecture #9 - Clustering
No ratings yet
CE345 - Lecture #9 - Clustering
56 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
Lec6,7,8 K-Means, Niavebase, KNN
No ratings yet
Lec6,7,8 K-Means, Niavebase, KNN
25 pages
Gorilla A Fast Scalable InMemory DB
No ratings yet
Gorilla A Fast Scalable InMemory DB
12 pages
K Means
No ratings yet
K Means
25 pages
Cui 2014
No ratings yet
Cui 2014
11 pages
05 Classification
No ratings yet
05 Classification
33 pages
ML QB Unit Wise
No ratings yet
ML QB Unit Wise
11 pages
Classification and Clustering Algorithm Notes
No ratings yet
Classification and Clustering Algorithm Notes
19 pages
Post Graduate Diploma in Machine Learning & Artificial Intelligence (PGD-ML&AI)
No ratings yet
Post Graduate Diploma in Machine Learning & Artificial Intelligence (PGD-ML&AI)
19 pages
Unit4 ML
No ratings yet
Unit4 ML
20 pages
Universal Distances Detect Anomalies in Big Data
No ratings yet
Universal Distances Detect Anomalies in Big Data
13 pages
2.hybrid Machine Learning and MCDM Framework For Consumer Preference
No ratings yet
2.hybrid Machine Learning and MCDM Framework For Consumer Preference
24 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
Clustering
No ratings yet
Clustering
6 pages
Seismic Magnitude Forecasting Through Machine Learning Paradigms: A Confluence of Predictive Models
No ratings yet
Seismic Magnitude Forecasting Through Machine Learning Paradigms: A Confluence of Predictive Models
8 pages
K Means
No ratings yet
K Means
9 pages
K Mean
No ratings yet
K Mean
9 pages
ML Assign4
No ratings yet
ML Assign4
7 pages
ML Notes
No ratings yet
ML Notes
12 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
Efficient Hierarchical Clustering of Large Data Sets Using P-Trees
No ratings yet
Efficient Hierarchical Clustering of Large Data Sets Using P-Trees
4 pages
Partitioning Method
No ratings yet
Partitioning Method
8 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
UnsupervisedLearning FoundationalMathofAI S24
No ratings yet
UnsupervisedLearning FoundationalMathofAI S24
6 pages
Interview For A Developer With Experience in Google Cloud Platform (GCP), BigQuery, DevOps, and GCP Composer
No ratings yet
Interview For A Developer With Experience in Google Cloud Platform (GCP), BigQuery, DevOps, and GCP Composer
4 pages
KMeans
No ratings yet
KMeans
2 pages
Data Mining Concepts Explained
No ratings yet
Data Mining Concepts Explained
2 pages
Supervised Vs Unsupervised Classification
No ratings yet
Supervised Vs Unsupervised Classification
2 pages
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet

Machine Learning: Chapter 2 Clustering

Uploaded by

Machine Learning: Chapter 2 Clustering

Uploaded by

Machine Learning

Minhhuy Le, ICSLab, Phenikaa Uni. 3

Minhhuy Le, ICSLab, Phenikaa Uni. 4

Key question: Which attribute is best?

Minhhuy Le, ICSLab, Phenikaa Uni. 5

Information Gain of A is the expected reduction in entropy of target variable Y for

Minhhuy Le, ICSLab, Phenikaa Uni. 6

Minhhuy Le, ICSLab, Phenikaa Uni. 7

Minhhuy Le, ICSLab, Phenikaa Uni. 8

1. Calculate Entropy of one attribute

Minhhuy Le, ICSLab, Phenikaa Uni. 9

1. Select random samples from a given dataset.

2. Construct a decision tree for each sample and get a

3. Perform a vote for each predicted result.

4. Select the prediction result with the most votes as the

Minhhuy Le, ICSLab, Phenikaa Uni. 10

No label data (y)

Minhhuy Le, ICSLab, Phenikaa Uni. 11

Minhhuy Le, ICSLab, Phenikaa Uni. 12

Minhhuy Le, ICSLab, Phenikaa Uni. 13

Minhhuy Le, ICSLab, Phenikaa Uni. 14

K-means algorithm is guaranteed to converge in

Minhhuy Le, ICSLab, Phenikaa Uni. 15

Minhhuy Le, ICSLab, Phenikaa Uni. 16

Minhhuy Le, ICSLab, Phenikaa Uni. 17

Minhhuy Le, ICSLab, Phenikaa Uni. 18

Minhhuy Le, ICSLab, Phenikaa Uni. 19

Minhhuy Le, ICSLab, Phenikaa Uni. 20

Could repeat several times to get the best solution

Minhhuy Le, ICSLab, Phenikaa Uni. 21

Minhhuy Le, ICSLab, Phenikaa Uni. 22

• K-means is a parametric method, where the parameters are the prototypes.

Minhhuy Le, ICSLab, Phenikaa Uni. 23

You might also like