0% found this document useful (0 votes)
12 views

Module 3 - 1

Uploaded by

zaidnadaf14
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Module 3 - 1

Uploaded by

zaidnadaf14
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 149

Module 3

Unsupervised Learning
Contents
⦿ Pattern classification by distance function: Measures of similarity.
⦿ Clustering criteria.
K means Clustering
⦿ Pattern classification by like hood function
⦿ Pattern classification as a Statistical
decision problem.
⦿ Bayes classifier for normal patterns
Pattern classification by distance
function: Measures of similarity
- KNN Classifier
Intuition of KNN Classifier
(Measure of Dissimilarity)
⦿ Let O1 and O2 be 2 objects from universe
⦿ Distance(Dissimilarity) between two is D(O1, O2)
⦿ K nearest neighbors is a simple algorithm that stores all
available cases and classifies new cases based on a similarity
measure (e.g., distance functions).
⦿ Non- parametric method
Distance function for numerical
attributes
⦿ Euclidean distance
⦿ City block distance (Manhatan Distance)
Different distance measures
KNN Algorithm
⦿ A case is classified by a majority vote of its neighbors, with the
case being assigned to the class most common amongst its K
nearest neighbors measured by a distance function.

⦿ If K = 1, then the case is simply assigned to the class of its


nearest neighbor.
Choosing Value of K
⦿ Choosing the optimal value for K is best done by first
inspecting the data.
⦿ In general, a large K value is more precise as it reduces the
overall noise but there is no guarantee.
⦿ Cross-validation is another way to retrospectively determine a
good K value by using an independent dataset to validate the K
value.
⦿ The optimal K for most datasets has been between 3-10. That
produces much better results than 1NN.
Example 1: Classify C
⦿ Classify c :
Example 2 : KNN Classifier
Example 3
⦿ Consider the following data concerning credit default. Age and Loan are two
numerical variables (predictors) and Default is the target.
⦿ I/p Pattern - unknown case present at (Age=48 and
Loan=$142,000)
⦿ Use Euclidean distance as measure
⦿ Choose K. Let’s select K=1.
⦿ O/p of algorithm for K =3 :
⦿ Conclusion :
With K=3, there are two Default=Y and one Default=N out of three
closest neighbors. The prediction for the unknown case is again
Default=Y.
⦿ Drawback :
- Choosing K nearest neighbours for each sample is complex
Clustering
⦿ Clustering can be considered the most important unsupervised
learning problem;
⦿ it deals with finding a structure in a collection of unlabeled
data.
⦿ Definition :the process of organizing objects into groups whose
members are similar in some way
⦿ A cluster is therefore a collection of objects which are
“similar” between them and are “dissimilar” to the objects
belonging to other clusters.
⦿ Goal of Clustering :
- to determine the intrinsic grouping in a set of unlabeled data.
- how to decide what constitutes a good clustering?
- It can be shown that there is no absolute “best” criterion which
would be independent of the final aim of the clustering.
- Consequently, it is the user which must supply this
criterion, in such a way that the result of the clustering will
suit their needs.
Example of clustering
⦿ Group these characters
⦿ Clustering criteria 1 ⦿ Clustering criteria 2
Applications
⦿ Shopping websites
⦿ Marketing
⦿ Biology
⦿ Library
⦿ City planning
⦿ Insurance
⦿ WWW
⦿ Earthquake studies
Types of clustering
⦿ K-means clustering
⦿ Hierarchical clustering
⦿ Agglomerative clustering (Bottom up approach)
⦿ Divisive Clustering
K-means Clustering Basic
Idea
K-means within class and
between cluster distances
⦿ Within and between cluster distances
⦿ Centroids : Centre of each cluster
⦿ B© : Distances between centroids
⦿ W© : Distances within clusters
Example : cardiac data
⦿ Each data point represents a patient
⦿ 2 clusters by human perceptions : Blue, Red
⦿ Problem with real life applications
Flow chart of K- means
Clustering
K- means clustering example
1) Initialization : Without any axes
2) Decide K , Here K =2 – Assign randomly K centroids
3) Calculate B(C ). Decide Boundary line
4) Deciding clusters with its data points
Iteration 2
5) W( C ) is calculated and based on (avg W( C) that centroid is moved to centre
6) Same process is repeated. Calculation of B ( C). Deciding
boundary line
⦿ Iteration 1 ⦿ Iteration 2
⦿ Clusters after Iteration 1 ⦿ Clusters after Iteration 2
Algorithm
Steps of K-means Clustering
Summarized Algorithm
In summarized form
⦿ K-means algorithm means to minimize objective function
Deciding value of K
When to use K-means
clustering?
Flow Chart
• Apply K-means Clustering on this dataset

Consider K =2,
Example 2
⦿ Using Rectilinear Distance :
⦿ After Iteration 1:
⦿ Iteration 1 : graphical representation
⦿ Iteration 2 :
⦿ Iteration 2 : graphical representation
⦿ Updating centroids of each cluster
⦿ Iteration 3
⦿ Iteration 3 : Graphical representation
⦿ Iteration 4:
⦿ Iteration 3 = Iteration 4 -----So Stop

⦿ K- means clustering ends at point where your clustering of


groups not changes from one iteration to other.
Hierarchical Clustering
• the
hierarchy of clusters in the form of a tree is
developed, and this tree-shaped structure is
known as the dendrogram.
• resultsof K-means clustering and hierarchical
clustering may look similar, but they both differ
depending on how they work.
• norequirement to predetermine the number of
clusters as like in the K-Means algorithm.
Types
• Agglomerative: Agglomerative is a bottom-up approach, in which the
algorithm starts with taking all data points as single clusters and merging
them until one cluster is left.

• Divisive: Divisive algorithm is the reverse of the agglomerative algorithm


as it is a top-down approach.
Hierarchical Clustering –
Agglomerative Clustering
⦿ Bottom Up Approach
⦿ Finding dissimilarity between clusters
⦿ Need :
⦿ Comparison with K-means clustering
Agglomerative Clustering - Steps
• This algorithm considers each dataset as a single cluster at the
beginning, and

• Then start combining the closest pair of clusters together.

• It does this until all the clusters are merged into a single cluster
that contains all the datasets.
Example 2
⦿ We have six points in data set. Use hierarchical clustering. Use ED as measure
⦿ Using Hierarchical Clustering :
⦿ Using the different thresholds one can find the no. of clusters

⦿ General way is to choose midpoint of longest branch


Dendrogram
Agglomerative Algorithm
Steps in detail
• Step-1: Create each data point as a single cluster. Let's say there
are N data points, so the number of clusters will also be N.
• Step-2: Take two closest data points or clusters and merge them
to form one cluster. So, there will now be N-1 clusters.
• Step-3: Again, take the two closest clusters and merge them
together to form one cluster. There will be N-2 clusters.
• Step-4: Repeat Step 3 until only one cluster left. So, we will get
the following clusters.
• Step-5: Once all the clusters are combined into one big cluster,
develop the dendrogram to divide the clusters as per the problem.
Measure for the distance between two
clusters

• the closest distance between the two clusters is crucial for the
hierarchical clustering.

• These measures are called Linkage methods.


• Single Linkage: It is the Shortest Distance between the closest
points of the clusters. Consider the below image:
• Complete Linkage: It is the farthest distance between the two points of
two different clusters. It is one of the popular linkage methods as it forms
tighter clusters than single-linkage.
• Average Linkage: It is the linkage method in which the distance
between each pair of datasets is added up and then divided by the
total number of datasets to calculate the average distance between
two clusters. It is also one of the most popular linkage methods.
• Centroid Linkage: It is the linkage method in which the distance
between the centroid of the clusters is calculated.
• One can can apply any of them according to the type of problem or
business requirement.
Working of Dendogram
• The dendrogram is a tree-like structure that is mainly used to store
each step as a memory that the HC algorithm performs.

• In the dendrogram plot, the Y-axis shows the Euclidean distances


between the data points, and the x-axis shows all the data points
of the given dataset.
• Draw Dendogram
Example 3
⦿ Use Agglomerative clustering on this data set
⦿ Iteration 1
⦿ Iteration 2:
⦿ Iteration 3:
Measuring similarity between
clusters
Cluster Distances
Cluster Distance Measure
⦿ Different Linkages :
⦿ Iteration 1 ⦿ Iteration 2
⦿ Iteration 3 ⦿ Iteration 4
⦿ In this way algorithm is continued..
Example 4
⦿ Apply Hierarchical clustering to this data set. Using Centroid Linkage. Draw
Dendrogram
⦿ Hierarchical Clustering
Example 5
⦿ Find clusters using Single link technique. Use ED as measure.
Draw dendrogram.
⦿ Draw Graph
⦿ The distance Matrix
⦿ P3 and P6 will form one cluster as there is min. dist. Between
them
⦿ Update Dist matrix :
⦿ Min [dis(P3,P6), P1]

⦿ Min[dist(P3,P6), P2]
⦿ Iteration 2: Updated Matrix
⦿ Min. Value = 0.14, P2 and P5 should form one cluster
⦿ Updating matrix for (P2,P5)
⦿ Iteration 3 : updated matrix
⦿ Min value = 0.15, (P3, P6) and (P2, P5) should form one
cluster
⦿ Iteration 4 :
⦿ Iteration 4 updated matrix
Divisive Clustering
⦿ It is a top-down clustering method which works in a similar
way to agglomerative clustering but in the opposite direction.

⦿ This method starts with a single cluster containing all objects


and then successively splits resulting clusters until only
clusters of individual objects remain.
Anomaly Detection
• Anomaly detection is the process of identifying data points, entities or events that
fall outside the normal range.

• An anomaly is anything that deviates from what is standard or expected.

• Humans and animals do this habitually when they spot a ripe fruit in a tree or a
rustle in the grass that stands out from the background and could represent an
opportunity or threat.

• Thus, the concept is sometimes framed as outlier detection or novelty detection.


• Anomaly detection is often used to detect suspicious events, unexpected opportunities
or bad data buried in time series data.

• A suspicious event might indicate –

- a network breach, fraud, crime, disease or faulty equipment.

• An unexpected opportunity could involve

- finding a store,

- product or salesperson that's performing much better than others and should be
investigated for insight into improving the business.
Application Areas
• Company

• Bank Fraud Detection

• Shopping – With Discount

• medical problems

• malfunctioning equipment

• Faults in machine
• Point Anomaly

A tuple within the dataset can be said as a Point anomaly if it is far


away from the rest of the data.

Example: An example of a point anomaly is a sudden transaction of


a huge amount from a credit card.
• Contextual Anomaly

- Called conditional outliers.

- If a particular observation is different from other data points, then


it is known as a contextual Anomaly.

- In such types of anomalies, an anomaly in one context may not


be an anomaly in another context.
• Collective Anomaly

Collective anomalies occur when a data point within a set is


anomalous for the whole dataset, and such values are known as
collective outliers.
Application Areas
• Density-Based Spatial Clustering of Applications with Noise

DBSCAN requires only two parameters:

epsilon and minPoints.

Epsilon is the radius of the circle to be created around each data


point to check the density

minPoints is the minimum number of data points required inside


that circle for that data point to be classified as a Core point.
Example : OCR
⦿ Suppose that a camera creates a digital image of a page of
text. Segmentation is first performed to determine the location
of each letter. Following this, the individual letters must be
classified correctly. Let , Data set = {A, B, C, D, E, F, G, H}
which would ordinarily include all of the letters of the
alphabet.
⦿ Suppose that there are three different image processing
algorithms:
1) Shape extractor: This returns if the letter is composed of
straight edges only, and if it contains at least one curve.
⦿ [] End counter: This returns , the number of segment ends.
For example, has none and has four.
⦿ [] Hole counter: This returns , the number of holes enclosed
by the character. For example, has none and has one.
⦿
⦿ A mapping from letters to feature values:
SVM
⦿ Concept
⦿ Linear
⦿ Non-linear
How does it work?
⦿ Identify Cat or Dog?
⦿ Support Vectors :
⦿ Linear SVM : Hyperplane
⦿ Non-linear SVM example
Non –linear SVM
⦿ Finding equation for SV :
⦿ Final Classification result

You might also like