UnSupervised Learning
UnSupervised Learning
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Agenda
5 What is Clustering?
6 Types of clustering
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Why do we need Machine Learning?
• In the past, we used to have data in a structured format but now as the volume
Why machine of the data is increasing, so the number of structured data becomes very less,
Learning becomes
more popular
so to handle the massive amount of data we need data science techniques.
these days?
• Those data can be used to get the proper business insights and the hidden
trends from them.
• These insights helps the organization to predict the Future
• Helps to reduce the production cost
• Build model based on the data to give the ability to the machine to predicts on
its own
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
What is Machine Learning?
Machine learning is a sub-set of artificial intelligence (AI) that allows the system to automatically learn and
improve from experience without being explicitly programmed
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Traditional vs Machine Learning
Traditional Programming Machine Learning
Data Data
Output Model
Program Output
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Machine Learning
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Life cycle of Machine Learning
Machine Learning
Deploy the model Predict your model accuracy
Algorithm
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Types of Machine Learning
Supervised
Learning
Unsupervised
Learning
Reinforcement
Learning
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Unsupervised Learning
Problem to be resolved: Classifying unstructured and unlabeled data into different categories/Predicting the
class of unlabeled and unstructured data
Solution: This is where supervised learning fail, and unsupervised learning algorithms come into picture.
Training data for unsupervised learning is collection of information without any label
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Unsupervised Learning - Example
• A set of relevant information is fed into the system
• The system identifies different type of Bike using features like color, size, speed limit, average etc., and
categorizes them
• When a new Bike is shown, it analyses its features and puts it into the category having similar featured
items
NOTE: Points within the same clusters are similar to each other but are
different when compared to other cluster
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Example Of Clustering
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Why Clustering is needed?
2 Organizing data into clusters, thereby showing internal structure of the data
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Types Of Clustering
Clustering
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Types Of Clustering
Clustering
Cluster 0
Cluster 2
Cluster 1
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Types Of Clustering
Clustering
Cluster 1
Cluster 2
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Types Of Clustering
Clustering
Cluster 0
Cluster having a parent-child relationship / tree-like structure.
Cluster 1
Ex: Hierarchical Clustering C2
C1 C0
Cluster 2
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
What Is K-Means Clustering?
K-Means is a clustering algorithm which focuses on grouping similar elements or data points into a cluster
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Business Application Of K-means
Behavioural Segmentation
Inventory Categorization
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Number of Clusters = 3
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Number of Clusters = 3
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Step 1: Identify the number of clusters (K =3 in this case)
Step 2: Randomly select 3 distinct data point
Step 3: Measure the distance between the 1st point and selected 3 clusters
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Step 5: Calculate the mean value including the new point for the orange cluster
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Find to which cluster does point 2 belongs to, HOW?
Repeat the same procedure but measure the distance to the orange mean
DistanceProprietary
fromcontent.
point 2Learning.
© Great to the green
All Rights cluster
Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Point 2 belongs to the orange cluster
Calculate the new cluster mean including the new point
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Find to which cluster does point 3 belongs to, HOW?
Repeat the same procedure but measure the distance to the red mean
cluster
Understanding K - Means Algorithm
Point 3 belongs to the orange cluster
Measure the distance and add the 3rd point to the cluster(orange) having the minimum distance & calculate the
new cluster mean including the new point
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Find to which cluster does point 5 belongs to, HOW?
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Since rest of the points lies closest to the green cluster, so all the point belong to green cluster
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Original/Expected Result
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Iteration 1:
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Iteration 2: Start from the first But with different initial random point (as compared to the 1st iteration)
Step 3: Measure the distance between the points and selected 3 clusters
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Iteration 3: Restart from the scratch with different initial random point (as compared
to the 2nd iteration)
Step 3: Measure the distance between the points and selected 3 clusters
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Iteration 1:
Iteration 2:
Iteration 3:
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How To Decide The Value Of K?
This time k = 3 was known, but what-if the exact value of k is unknown?
The idea behind partitioning, is to define clusters such that total intra-cluster variation or total with-in sum of
square (WSS) for each cluster is minimized.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How To Decide The Value Of K: Elbow Method
The Elbow method looks at the total WSS as a function of the number of clusters
Number of clusters should be chosen so that on adding another cluster doesn’t improve the total WSS.
Intra-Cluster Variance
Number of Cluster
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How To Decide The Value Of K: Elbow Method
Compute different values of k varying k from 1 to 10 clusters
For each k, calculate the total within-cluster sum of square (WSS)
Plot the curve of WSS according to the number of clusters k
The location of a bend (knee) in the plot gives the appropriate number of clusters
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Thank You
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited