0% found this document useful (0 votes)
16 views11 pages

Lesson 5 - Unsupervised Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views11 pages

Lesson 5 - Unsupervised Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

INTRODUCTION TO

ARTIFICIAL INTELLIGENCE
BUI NGOC DUNG
Information (if available)

CHAPTER 5: UNSUPERVISED LEARNING


K-MEANS CLUSTERING
UNSUPERVISED LEARNING
❑ Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset
and are allowed to act on that data without any supervision.
❑ The aim of an unsupervised algorithm is to find the underlying structure of dataset, group that data
according to similarities, and represent that dataset in a compressed format.
CLUSTER
❑ The organization of unlabeled data into similarity groups called clusters.
❑ A cluster is a collection of data items which are “similar” between them, and “dissimilar” to data items in
other clusters.
CLUSTERING
Clustering is a type of unsupervised learning that automatically forms clusters of similar things.
K-MEANS CLUSTERING
K-means is an algorithm that find k clusters for a given dataset. The number of clusters k is user defined. Each cluster is
described by a single point known as the centroid. Centroid means it’s at the center of all the points in the
cluster.
K-MEANS CLUSTERING
❑ Pros: Easy to implement
❑ Cons: Can converge at local minimal; slow on very large datasets
❑ Work with: Numeric values
PSEUDO-CODE
𝐶𝑟𝑒𝑎𝑡𝑒 𝑘 𝑝𝑜𝑖𝑛𝑡𝑠 𝑓𝑜𝑟 𝑠𝑡𝑎𝑟𝑡𝑖𝑛𝑔 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑𝑠 (𝑜𝑓𝑡𝑒𝑛 𝑟𝑎𝑛𝑑𝑜𝑚𝑙𝑦)
𝑊ℎ𝑖𝑙𝑒 𝑎𝑛𝑦 𝑝𝑜𝑖𝑛𝑡 ℎ𝑎𝑠 𝑐ℎ𝑎𝑛𝑔𝑒𝑑 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑎𝑠𝑠𝑖𝑔𝑛𝑚𝑒𝑛𝑡
𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑝𝑜𝑖𝑛𝑡 𝑖𝑛 𝑜𝑢𝑟 𝑑𝑎𝑡𝑎𝑠𝑒𝑡:
𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑:
𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡ℎ𝑒 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑 𝑎𝑛𝑑 𝑝𝑜𝑖𝑛𝑡
𝑎𝑠𝑠𝑖𝑔𝑛 𝑡ℎ𝑒 𝑝𝑜𝑖𝑛𝑡 𝑡𝑜 𝑡ℎ𝑒 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑤𝑖𝑡ℎ 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑠𝑡 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑎𝑡 𝑐𝑙𝑢𝑠𝑡𝑒𝑟:
𝑎𝑠𝑠𝑖𝑔𝑛 𝑡ℎ𝑒 𝑐𝑒𝑛𝑡𝑟𝑜𝑖𝑑 𝑡𝑜 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛
DISTANCE MEASURE
❑ Distance measure determines the similarity between two elements and influences the shape of clusters.
❑ K-Means clustering supports various kinds of distance measures, and the most method is used Euclidean
measure to calculate the distance between two points.
GENERAL APPROACH TO K-MEANS CLUSTERING
1. Collect: Any method.
2. Prepare: Numeric values are needed for a distance calculation, and nominal val ues can be mapped into
binary values for distance calculations.
3. Analyze: Any method.
4. Train: Doesn’t apply to unsupervised learning.
5. Test: Apply the clustering algorithm and inspect the results. Quantitative error measurements such as sum
of squared error (introduced later) can be used.
6. Use: Anything you wish. Often, the clusters centers can be treated as representative data of the whole
cluster to make decisions.
ILLUSTRATION
❑ https://www.naftaliharris.com/blog/visualizing-k-means-clustering/
❑ http://stanford.edu/class/ee103/visualizations/kmeans/kmeans.html
THANK YOU
INFORMATION (IF AVAILABLE)
Information (if available)

Information (if available)

You might also like