Module 3 - 1
Module 3 - 1
Unsupervised Learning
Contents
⦿ Pattern classification by distance function: Measures of similarity.
⦿ Clustering criteria.
K means Clustering
⦿ Pattern classification by like hood function
⦿ Pattern classification as a Statistical
decision problem.
⦿ Bayes classifier for normal patterns
Pattern classification by distance
function: Measures of similarity
- KNN Classifier
Intuition of KNN Classifier
(Measure of Dissimilarity)
⦿ Let O1 and O2 be 2 objects from universe
⦿ Distance(Dissimilarity) between two is D(O1, O2)
⦿ K nearest neighbors is a simple algorithm that stores all
available cases and classifies new cases based on a similarity
measure (e.g., distance functions).
⦿ Non- parametric method
Distance function for numerical
attributes
⦿ Euclidean distance
⦿ City block distance (Manhatan Distance)
Different distance measures
KNN Algorithm
⦿ A case is classified by a majority vote of its neighbors, with the
case being assigned to the class most common amongst its K
nearest neighbors measured by a distance function.
Consider K =2,
Example 2
⦿ Using Rectilinear Distance :
⦿ After Iteration 1:
⦿ Iteration 1 : graphical representation
⦿ Iteration 2 :
⦿ Iteration 2 : graphical representation
⦿ Updating centroids of each cluster
⦿ Iteration 3
⦿ Iteration 3 : Graphical representation
⦿ Iteration 4:
⦿ Iteration 3 = Iteration 4 -----So Stop
• It does this until all the clusters are merged into a single cluster
that contains all the datasets.
Example 2
⦿ We have six points in data set. Use hierarchical clustering. Use ED as measure
⦿ Using Hierarchical Clustering :
⦿ Using the different thresholds one can find the no. of clusters
• the closest distance between the two clusters is crucial for the
hierarchical clustering.
⦿ Min[dist(P3,P6), P2]
⦿ Iteration 2: Updated Matrix
⦿ Min. Value = 0.14, P2 and P5 should form one cluster
⦿ Updating matrix for (P2,P5)
⦿ Iteration 3 : updated matrix
⦿ Min value = 0.15, (P3, P6) and (P2, P5) should form one
cluster
⦿ Iteration 4 :
⦿ Iteration 4 updated matrix
Divisive Clustering
⦿ It is a top-down clustering method which works in a similar
way to agglomerative clustering but in the opposite direction.
• Humans and animals do this habitually when they spot a ripe fruit in a tree or a
rustle in the grass that stands out from the background and could represent an
opportunity or threat.
- finding a store,
- product or salesperson that's performing much better than others and should be
investigated for insight into improving the business.
Application Areas
• Company
• medical problems
• malfunctioning equipment
• Faults in machine
• Point Anomaly