Digital Computer Concept and Practice: Unsupervised Learning

The document discusses unsupervised learning techniques, focusing on clustering methods such as K-means, agglomerative clustering, and DBSCAN. It details the principles of the K-means algorithm, including its advantages, disadvantages, and the importance of initial centroid selection. Additionally, it outlines methods for determining the optimal number of clusters, such as the elbow method and silhouette plot.

Uploaded by

hanyeelovesgod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views21 pages

Digital Computer Concept and Practice: Unsupervised Learning

Uploaded by

hanyeelovesgod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

035.

001 Spring, 2024

Digital Computer Concept and Practice

Unsupervised Learning (1)

Soohyun Yang

College of Engineering
Department of Civil and Environmental Engineering
Types of ML techniques – All learning is learning!
Our scope

Regression
“Presence of labels” “Absence of labels” “Behavior-driven : feedback loop”
• Advertisement popularity • Recommender systems (YT) • Learning to play games (AlphaGo)
• Spam classification • Clustering
Buying habits (group customers) • Industrial simulation
• Classification
Face recognition • Grouping user logs • Resource management

https://towardsdatascience.com/what-are-the-types-of-machine-learning-e2b9e5d1756f
Unsupervised learning

Doshi (2020)
https://twitter.com/athena_schools/status/1063013435779223553
Clustering
 Aim : To find a natural grouping in data so that items in the same cluster
are more similar to each other than to those from different clusters.
=> High within-cluster similarity & Low inter-cluster similarity

 In Unsupervised Learning :
• Work with unlabeled samples
• No needs for splitting the training and test sets

 Representative algorithms :
1) K-means clustering (K-평균 군집) => Prototype-based clustering
2) Agglomerative clustering (병합 군집) => Hierarchical-based clustering
3) DBSCAN (Density-Based Spatial Clustering with Noise) => Density-based clustering
K-means algorithm
 Prototype-based clustering :
Each cluster is represented by a prototype, which is the centroid
(average) of features of samples within the same cluster.

 Advantages :
• Very easy to implement
• Computationally very efficient compared to other clustering algorithms
 Disadvantages :
• Have to specify the number of clusters, k, in advance
• Poor clustering performance can be induced by an inappropriate choice for k
K-means algorithm : Principles
1. Define the number of clusters (k) Example : Randomly generated 12 samples with 2 features
to group the data (here, k=3)
K-means algorithm : Principles
1. Define the number of clusters (k)
to group the data (here, k=3)
2. Select k random points within the d1
data => initial centroids!
3. Calculate (Euclidean) distance
between individual centroids and
other points d0
d2

Feature 1 Feature 2
1.47 0.32
8.40 9.47
2.10 -0.22
K-means algorithm : Principles
This point gets involved in C1,
1. Define the number of clusters (k) because of d1 < d0 < d2.
to group the data (here, k=3)
2. Select k random points within the d1
data => initial centroids!
3. Calculate distance between
individual centroids and other points
4. Assign each point to the nearest
centroid
5. Calculate the center of each cluster
and move the centroids => update!
K-means algorithm : Principles
1. Define the number of clusters (k)
to group the data (here, k=3)
2. Select k random points within the
data => initial centroids!
3. Calculate distance between
individual centroids and other points
4. Assign each point to the nearest
centroid
5. Calculate the center of each cluster
and move the centroids => update!
6. Repeat the steps 3-5 until the
centroids are not changed.
K-means algorithm : Principles
1. Define the number of clusters (k)
to group the data (here, k=3)
2. Select k random points within the
data => initial centroids!
3. Calculate distance between
individual centroids and other points
4. Assign each point to the nearest
centroid
5. Calculate the center of each cluster
and move the centroids => update!
6. Repeat the steps 3-5 until the
centroids are not changed.
K-means algorithm : Principles
1. Define the number of clusters (k)
to group the data (here, k=3)
2. Select k random points within the
data => initial centroids!
3. Calculate distance between
individual centroids and other points
4. Assign each point to the nearest
centroid
5. Calculate the center of each cluster
and move the centroids => update!
6. Repeat the steps 3-5 until the
centroids are not changed.
Clustering depends on the initial centroids
 Let’s take the same 12 samples, but initial centroids differ.

Feature 1 Feature 2
3.50 5.10
10.10 8.10
2.10 7.80
How to better control initial centroids in python?
 ‘KMeans’ class
 ‘init’ parameter => K-means++ (default), or random
• k-means++: Smarter initialization of the centroids & quicker convergence.
⇒ 1) Randomly select the first centroid from the data points.
⇒ 2) Compute the distance between each point and the nearest, previously
chosen centroid.
⇒ 3) Select the next centroid as a point having maximum distance from the
nearest centroid
⇒ 4) Repeat steps 2 to 3 until k centroids are determined
• ‘random’: choose n_clusters observations (rows) at random from data for the
initial centroids.

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
Let’s check the result from the KMeans class
How to find the optimal number of clusters?
>> No perfect solution exists in finding the optimal number of clusters!
 1) Elbow method (엘보우 방법) : To identify the number of clusters
where the inertia* begins to increase most rapidly.
(*The sum of squared errors (SSE) between a cluster’s centroid and a sample within
the cluster)

https://www.oreilly.com/library/view/statistics-for-
machine/9781788295758/c71ea970-0f3c-4973-8d3a-b09a7a6553c1.xhtml
How to find the optimal number of clusters? (con’t)
>> No perfect solution exists in finding the optimal number of clusters!
 2) Silhouette plot (실루엣 그래프) : To find the number of clusters
where the average silhouette coefficient (s) is closed to 1.

where -1 ≤ s(i) ≤ 1

• The cluster cohesion, 𝑎𝑎(i), as the average distance between an sample, 𝒙𝒙(i) , and all other points
in the same cluster.
=> Greater 𝑎𝑎(i) indicates worse within-cluster similarity of the sample.

• The cluster separation, 𝑏𝑏(i), as the average distance between the sample, 𝒙𝒙(i), and all examples
in the nearest cluster.
=> Greater b(i) indicates better inter-cluster similarity of the sample.
In-class exercise 1: K-means clustering (KMC)
 Let’s solve a clustering problem
via the KMC algorithm
 1. Data preparation & import :
InClassData_Weight_Dist_USL.csv
In-class exercise 1: KMC (con’t)
 2. Conduct the feature scaling
 3. Set up the K-means CA
In-class exercise 1: KMC (con’t)
 4. Execute the elbow method & the silhouette plot by varying the number
of clusters
In-class exercise 1: KMC (con’t)
 5. Identify the optimal k based on the visualized results
Take-home points (THPs)
-
-
-
…

Kawaguchi Series
50% (2)
Kawaguchi Series
2 pages
Westock - Ultra Slim Floor Beam (USFB) Design
100% (1)
Westock - Ultra Slim Floor Beam (USFB) Design
20 pages
Industrial Color Testing Fundamentals and Techniques Second Edition
100% (1)
Industrial Color Testing Fundamentals and Techniques Second Edition
388 pages
Virtual Density Lab 2018 PDF
No ratings yet
Virtual Density Lab 2018 PDF
2 pages
VVDED302023 Altistart 48 Modbus Protocol
No ratings yet
VVDED302023 Altistart 48 Modbus Protocol
61 pages
Emeng 3131 Electrical Power Systems: Power System Transients, Power System Stability & Load Flow Studies
No ratings yet
Emeng 3131 Electrical Power Systems: Power System Transients, Power System Stability & Load Flow Studies
25 pages
Astm f2882
No ratings yet
Astm f2882
7 pages
EastWestAirlines Cluster
100% (1)
EastWestAirlines Cluster
6 pages
Assignment #2: Programming Fundamentals
No ratings yet
Assignment #2: Programming Fundamentals
7 pages
SPSS Multiple Linear Regression
No ratings yet
SPSS Multiple Linear Regression
55 pages
Kinematics of Motion: Motion Along A Straight Line
No ratings yet
Kinematics of Motion: Motion Along A Straight Line
26 pages
14 Loci and Transformations
No ratings yet
14 Loci and Transformations
83 pages
Preview: Gradient Based Histogram Equalization of Thermal Infrared Images
No ratings yet
Preview: Gradient Based Histogram Equalization of Thermal Infrared Images
24 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
A Few TEQC Tips For Getting Started: Beth Pratt-Sitaula (UNAVCO)
No ratings yet
A Few TEQC Tips For Getting Started: Beth Pratt-Sitaula (UNAVCO)
2 pages
CURVED BEAM 2021 PP 1-20
No ratings yet
CURVED BEAM 2021 PP 1-20
20 pages
Gen Chem Reviewer
100% (1)
Gen Chem Reviewer
6 pages
90 Integrals
No ratings yet
90 Integrals
2 pages
Terminn - Kjal Saved Output
No ratings yet
Terminn - Kjal Saved Output
49 pages
Materials For Mechanical Parts
No ratings yet
Materials For Mechanical Parts
20 pages
PTCR Behaviour of Highly Donor Doped Batio: S. Urek and M. Drofenik
No ratings yet
PTCR Behaviour of Highly Donor Doped Batio: S. Urek and M. Drofenik
4 pages
A Deep Learning Model For Detection of Cervical SP
No ratings yet
A Deep Learning Model For Detection of Cervical SP
12 pages
Cable Laying Specification
No ratings yet
Cable Laying Specification
16 pages
Solax Solar Inverter: ZDNY-TL10000 / 12000 / 15000 / 17000 / 20000
No ratings yet
Solax Solar Inverter: ZDNY-TL10000 / 12000 / 15000 / 17000 / 20000
1 page
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
Week 9
No ratings yet
Week 9
66 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
21 pages
Week 10
No ratings yet
Week 10
41 pages
Al Ict Notes - 04
No ratings yet
Al Ict Notes - 04
6 pages
Ionic Equilibrium
No ratings yet
Ionic Equilibrium
55 pages
Unit 3 - KmeansClustering
No ratings yet
Unit 3 - KmeansClustering
17 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Kmeans
No ratings yet
Kmeans
6 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Machinery Report
No ratings yet
Machinery Report
13 pages
Normalization 1
No ratings yet
Normalization 1
14 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
Data Type
No ratings yet
Data Type
22 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
K-Means With Elbow Method
No ratings yet
K-Means With Elbow Method
24 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
40 pages
Kmean
No ratings yet
Kmean
24 pages
Simple K Means
No ratings yet
Simple K Means
3 pages
Ecs 7000 6GD
No ratings yet
Ecs 7000 6GD
2 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Week07b FitProbDist
No ratings yet
Week07b FitProbDist
19 pages
Clustering Algorithms: K-Means
No ratings yet
Clustering Algorithms: K-Means
17 pages
K Means
No ratings yet
K Means
26 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
KMean Merged
No ratings yet
KMean Merged
13 pages
3.1 K - Means
No ratings yet
3.1 K - Means
16 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
HandsOnExs Variables and DataType
No ratings yet
HandsOnExs Variables and DataType
3 pages
MSA - Manual
No ratings yet
MSA - Manual
18 pages
Unit 4
No ratings yet
Unit 4
63 pages
Clustering Numericals
No ratings yet
Clustering Numericals
8 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
25 pages
Kmea
No ratings yet
Kmea
53 pages
ML-Unit III - K-Means Clustering
No ratings yet
ML-Unit III - K-Means Clustering
22 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
Digital Computer Concept and Practice: Supervised Learning
No ratings yet
Digital Computer Concept and Practice: Supervised Learning
30 pages
K Means
No ratings yet
K Means
40 pages
CM15 Extreme Value Distributions
No ratings yet
CM15 Extreme Value Distributions
7 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
K - Means Numerical
No ratings yet
K - Means Numerical
3 pages
MRV1
No ratings yet
MRV1
6 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
Algo
No ratings yet
Algo
59 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
Unit 4
No ratings yet
Unit 4
46 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
EFM Ch6
No ratings yet
EFM Ch6
35 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
Chapter 1
No ratings yet
Chapter 1
37 pages
Unit 4
No ratings yet
Unit 4
22 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Pilot
No ratings yet
Pilot
3 pages
K Clustering
No ratings yet
K Clustering
28 pages
Wa0000.
No ratings yet
Wa0000.
5 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Unit 4
No ratings yet
Unit 4
19 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
27 pages
K Means
No ratings yet
K Means
25 pages
Master Budgeting
No ratings yet
Master Budgeting
82 pages
Clustering
No ratings yet
Clustering
18 pages
Managerial Accounting and Cost Concepts
No ratings yet
Managerial Accounting and Cost Concepts
66 pages
2005 S Fin
No ratings yet
2005 S Fin
1 page
2003 S Fin
No ratings yet
2003 S Fin
1 page
2006 S Fin
No ratings yet
2006 S Fin
1 page
2007 S Fin
No ratings yet
2007 S Fin
1 page
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
Manish Soni
No ratings yet
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet