Summary - MachineLearning (Part 2)
Summary - MachineLearning (Part 2)
Learning
(Part 2)
IYKRA DATA FELLOWSHIP BATCH 3
Outline
• Introduction to Clustering
• Clustering Method
• K-Means
• Hierarchical
• Model Evaluation
• Cross Validation
• Model Performance and
Selection
Clustering is the process of dividing the entire
data into groups (also known as clusters)
based on the patterns in the data.
S U C H P RO B L E M S , W I TH O UT A N Y F I X E D TA RG E T VA R I A B L E , A R E K N OW N A S
U N S U P E RV I S E D L E A R N I N G P RO B L E MS . I N TH E S E P RO B L E MS , W E O N LY H AV E
T H E I N D E P E N D E N T VA R I A B LE S A N D N O TA RG E T / D E P E N D E N T VA R I A B L E .
K Means
Clustering
K-means clustering algorithm
computes the centroids and
iterates until we it finds
optimal centroid.
Working of K-Means Algorithm
• First, we need to specify the number of clusters, K,
need to be generated by this algorithm (Good K-value
can be distinguish by elbow method)
• Next, randomly select K data points and assign each
data point to a cluster. In simple words, classify the
data based on the number of data points.
• Now it will compute the cluster centroids.
• Next, keep iterating the following until we find
optimal centroid which is the assignment of data
points to the clusters that are not changing any more
Advantages and Disadvantages of
K-Means
ADVANTAGES DISADVANTAGES
It is very easy to understand and implement. It is a bit difficult to predict the number of clusters i.e.
the value of k.
If we have large number of variables then, K-means would
be faster than Hierarchical clustering. Output is strongly impacted by initial inputs like number
of clusters (value of k)
On re-computation of centroids, an instance can change the
cluster. Order of data will have strong impact on the final output.
Tighter clusters are formed with K-means as compared to It is very sensitive to rescaling. If we will rescale our data
Hierarchical clustering. by means of normalization or standardization, then the
output will completely change.
Document Clustering
Image segmentation
Image compression
Customer segmentation