0% found this document useful (0 votes)
68 views40 pages

UnSupervised Learning

The document discusses unsupervised learning and clustering. It covers why machine learning is needed, the life cycle of building a machine learning model, types of machine learning including unsupervised learning, what clustering is and its types like exclusive, overlapping and hierarchical clustering.

Uploaded by

Pandu K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views40 pages

UnSupervised Learning

The document discusses unsupervised learning and clustering. It covers why machine learning is needed, the life cycle of building a machine learning model, types of machine learning including unsupervised learning, what clustering is and its types like exclusive, overlapping and hierarchical clustering.

Uploaded by

Pandu K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Unsupervised Learning

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Agenda

1 Why do we need Machine Learning?


7 K-Means Clustering

2 What is Machine Learning?

8 Demo on K-Means Clustering


3 Life cycle to build a model with ML

4 What is unsupervised Learning?

5 What is Clustering?

6 Types of clustering

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Why do we need Machine Learning?
• In the past, we used to have data in a structured format but now as the volume

Why machine of the data is increasing, so the number of structured data becomes very less,
Learning becomes
more popular
so to handle the massive amount of data we need data science techniques.
these days?
• Those data can be used to get the proper business insights and the hidden
trends from them.
• These insights helps the organization to predict the Future
• Helps to reduce the production cost
• Build model based on the data to give the ability to the machine to predicts on
its own

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
What is Machine Learning?
Machine learning is a sub-set of artificial intelligence (AI) that allows the system to automatically learn and
improve from experience without being explicitly programmed

Training Data Model Building Testing Data

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Traditional vs Machine Learning
Traditional Programming Machine Learning

Data Data

Output Model

Program Output

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Machine Learning

Process to train a Machine Learning model

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Life cycle of Machine Learning

Understand the Exploratory data


Data Acquisition Data Cleaning
business problem Analysis

Machine Learning
Deploy the model Predict your model accuracy
Algorithm

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Types of Machine Learning

Supervised
Learning

Unsupervised
Learning

Reinforcement
Learning

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Unsupervised Learning
 Problem to be resolved: Classifying unstructured and unlabeled data into different categories/Predicting the
class of unlabeled and unstructured data

 Solution: This is where supervised learning fail, and unsupervised learning algorithms come into picture.

 The unsupervised learning algorithm cluster the input

Example: Cluster different bikes based


upon their speed limit, acceleration,
average

Data into different classes on the basis ofcontent.


Proprietary their statistical
© Great properties
Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Unsupervised Learning

Training data for unsupervised learning is collection of information without any label

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Unsupervised Learning - Example
• A set of relevant information is fed into the system
• The system identifies different type of Bike using features like color, size, speed limit, average etc., and
categorizes them
• When a new Bike is shown, it analyses its features and puts it into the category having similar featured
items

Groups depends on attributes Proprietary


used content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
What Is Clustering?
Process of dividing the datasets into groups, consisting of similar data-points

NOTE: Points within the same clusters are similar to each other but are
different when compared to other cluster

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Example Of Clustering

Example 1: Cluster of different colors of Example 2: Cluster of different types of garbage


fruits

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Why Clustering is needed?

1 Determine intrinsic grouping in a set of unlabeled data

2 Organizing data into clusters, thereby showing internal structure of the data

3 Create partition in the dataset

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Types Of Clustering

Clustering

Exclusive Clustering Overlapping Clustering Hierarchical Clustering

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Types Of Clustering
Clustering

Exclusive Clustering Overlapping Clustering Hierarchical Clustering

Item exclusively belongs to one cluster, not several.


Ex: K-means Clustering

Cluster 0
Cluster 2

Cluster 1

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Types Of Clustering
Clustering

Exclusive Clustering Overlapping Clustering Hierarchical Clustering

Set of items belonging to multiple clusters.


Ex: fuzzy/c-means is of this type.

Cluster 1

Cluster 2

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Types Of Clustering
Clustering

Exclusive Clustering Overlapping Clustering Hierarchical Clustering

Cluster 0
Cluster having a parent-child relationship / tree-like structure.
Cluster 1
Ex: Hierarchical Clustering C2

C1 C0
Cluster 2

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
What Is K-Means Clustering?

K-Means is a clustering algorithm which focuses on grouping similar elements or data points into a cluster

NOTE: ‘K’ in K-Means represent the number of clusters

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Business Application Of K-means

 Behavioural Segmentation

 Inventory Categorization

 Sorting sensor measurements

 Detecting bots or anomalies

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm

Number of Clusters = 3

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm

This is our final output… Let’s see how to do it!

Number of Clusters = 3

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Step 1: Identify the number of clusters (K =3 in this case)
Step 2: Randomly select 3 distinct data point
Step 3: Measure the distance between the 1st point and selected 3 clusters

Measure the distance from point 1 to the


orange cluster
Distance from point 1 to the blue cluster

Distance from point 1 to the green cluster


Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Step 4: Assign the 1st point to nearest cluster (orrange in this case)

Repeat the process

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Step 5: Calculate the mean value including the new point for the orange cluster

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Find to which cluster does point 2 belongs to, HOW?
Repeat the same procedure but measure the distance to the orange mean

Distance from point 2 to the orange cluster

Distance from point 2 to the blue cluster

DistanceProprietary
fromcontent.
point 2Learning.
© Great to the green
All Rights cluster
Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Point 2 belongs to the orange cluster
Calculate the new cluster mean including the new point

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Find to which cluster does point 3 belongs to, HOW?
Repeat the same procedure but measure the distance to the red mean

Distance from point 3 to the new


orange mean

Distance from point 3 to the blue cluster

Distance from point 3 to the green


Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

cluster
Understanding K - Means Algorithm
Point 3 belongs to the orange cluster
Measure the distance and add the 3rd point to the cluster(orange) having the minimum distance & calculate the
new cluster mean including the new point

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Find to which cluster does point 5 belongs to, HOW?

REPEAT THE SAME STEPS AGAIN…

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm
Since rest of the points lies closest to the green cluster, so all the point belong to green cluster

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm

Result from 1st iteration

Original/Expected Result

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm

Total variation within the cluster

Iteration 1:

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm

Iteration 2: Start from the first But with different initial random point (as compared to the 1st iteration)

 Step 1: Select the number of clusters, i.e. K =3

 Step 2: Randomly select 3 distinct data point

 Step 3: Measure the distance between the points and selected 3 clusters

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm

Iteration 3: Restart from the scratch with different initial random point (as compared
to the 2nd iteration)

Step 1: Select the number of clusters, i.e. K =3

Step 2: Randomly select 3 distinct data point

Step 3: Measure the distance between the points and selected 3 clusters

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Understanding K - Means Algorithm

Finally the iteration with the minimum variation is selected

Iteration 1:

Iteration 2:

Iteration 3:

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How To Decide The Value Of K?
This time k = 3 was known, but what-if the exact value of k is unknown?

The idea behind partitioning, is to define clusters such that total intra-cluster variation or total with-in sum of
square (WSS) for each cluster is minimized.

NOTE: The total WSS measures the compactness of the clustering

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How To Decide The Value Of K: Elbow Method
 The Elbow method looks at the total WSS as a function of the number of clusters
 Number of clusters should be chosen so that on adding another cluster doesn’t improve the total WSS.
Intra-Cluster Variance

Number of Cluster
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
How To Decide The Value Of K: Elbow Method
 Compute different values of k varying k from 1 to 10 clusters
 For each k, calculate the total within-cluster sum of square (WSS)
 Plot the curve of WSS according to the number of clusters k
 The location of a bend (knee) in the plot gives the appropriate number of clusters

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Thank You

Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited

You might also like