0% found this document useful (0 votes)

27 views20 pages

Unit4 ML

Uploaded by

Devabn Nirmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views20 pages

Unit4 ML

Uploaded by

Devabn Nirmal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

MACHINE LEARNING

UNIT4-UNSUPERVISED LEARNING
Introduction to clustering, KMeans clustering, KMode clustering,
Distance based clustering, Clusering around medoids,
silhoutts clustering, hierarchial clusttering

Introduction to Clustering: It is basically a type of unsupervised learning method. An unsupervised learning

method is a method in which we draw references from datasets consisting of input data without labeled
responses. Generally, it is used as a process to find meaningful structure, explanatory underlying processes,
generative features, and groupings inherent in a set of examples.

Clustering is the task of dividing the population or data points into a number of groups such that data points in
the same groups are more similar to other data points in the same group and dissimilar to the data points in
other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them. A
Clustering Algorithm tries to analyse natural groups of data on the basis of some similarity. Clustering is
dividing data points into homogeneous classes or clusters:

• Points in the same group are as similar as possible

• Points in different group are as dissimilar as possible
When a collection of objects is given, we put objects into group based on similarity.
Clustering is very much important as it determines the intrinsic grouping among the unlabelled data present.
There are no criteria for good clustering. It depends on the user, and what criteria they may use which satisfy
their need. For instance, we could be interested in finding representatives for homogeneous groups (data
reduction), finding “natural clusters” and describing their unknown properties (“natural” data types), in finding
useful and suitable groupings (“useful” data classes) or in finding unusual data objects (outlier detection).
Clustering Methods:
• Density-Based Methods: These methods consider the clusters as the dense region having some similarities
and differences from the lower dense region of the space. These methods have good accuracy and the ability
to merge two clusters. Example DBSCAN (Density-Based Spatial Clustering of Applications with Noise), OPTICS
(Ordering Points to Identify Clustering Structure), etc.
• Hierarchical Based Methods: The clusters formed in this method form a tree-type structure based on the
hierarchy. New clusters are formed using the previously formed one. It is divided into two categories
• Agglomerative (bottom-up approach)
• Divisive (top-down approach)
K-means clustering algorithm:
It is the simplest unsupervised learning algorithm that solves clustering problem. K-means algorithm partitions
n observations into k clusters where each observation belongs to the cluster with the nearest mean serving as
a prototype of the cluster.

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabelled dataset into different
clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2,
there will be two clusters, and for K=3, there will be three clusters, and so on. It is an iterative algorithm that
divides the unlabeled dataset into k different clusters in such a way that each dataset belongs only one group
that has similar properties.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between the data point and their corresponding clusters.

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each
cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.

In kmeans algorithm we find the euclidiean distance:

Euclidean distance=((x2-x1)^2+(y2-y1)^2)^1/2
K-Mode Clustering:
In machine learning, we often need to analyse datasets having categorical variables. Generally, K-Means
clustering is used as the partitioning clustering technique for numerical data. However, we cannot apply k-
means clustering to categorical data.
K-Modes clustering is an unsupervised machine learning technique. It is a partition clustering algorithm used
to group a dataset into K clusters.

K-mode clustering is an unsupervised machine-learning technique used to group a set of data objects into a
specified number of clusters, based on their categorical attributes. The algorithm is called “K-Mode” because it
uses modes (i.e. the most frequent values) instead of means or medians to represent the clusters.

K-Modes clustering is an iterative algorithm that starts by selecting k initial data points as centroids of the
cluster. After that, each data point in the dataset is assigned to a cluster based on its similarity with the
centroids. After creating clusters for the first time, we select a new centroid in each cluster using the mod of
each feature in the data. After selecting new clusters, we calculate their dissimilarity from each data point and
regroup the clusters. This process continues until the process converges and there is no change to the clusters
in two consecutive iterations.

The K-Modes clustering partitions the data into two mutually exclusive groups. Hence, it is termed a
partitioning clustering algorithm.
K-MEDOIDS CLUSTERING:
K-Medoid clustering is a partitioning method used in cluster analysis, a technique used to classify a set of
objects into groups (or clusters) such that objects in the same group are more similar to each other than to
those in other groups.
K-Medoid clustering is an extension of the K-Means clustering algorithm, with the main difference being that K-
Medoid uses actual data points as cluster centers (medoids) instead of the means of the points in the cluster.
This makes K-Medoid more robust to outliers and noise in the data.

K-Medoids and K-Means are two types of clustering mechanisms in Partition Clustering. First, Clustering is the
process of breaking down an abstract group of data points/ objects into classes of similar objects such that all
the objects in one cluster have similar traits. , a group of n objects is broken down into k number of clusters
based on their similarities.

Medoid: A Medoid is a point in the cluster from which the sum of distances to other data points is minimal.
(or)
A Medoid is a point in the cluster from which dissimilarities with all the other points in the clusters are
minimal.

K-medoids is an unsupervised method with unlabelled data to be clustered. It is an improvised version of the
K-Means algorithm mainly designed to deal with outlier data sensitivity. Compared to other partitioning
algorithms, the algorithm is simple, fast, and easy to implement.

Steps:
1. initially select k random points as the medoids from the given n data points.
2. Map each data point to the closest medoid by using any distance metric like Manhatton distance.
3. Calculate the cost as the total sum of distances of the data points from the assigned medoid c = ∑(Ci – Pi)
4. Swap one medoid point with any one of the non-medoid point and repeat the steps 2 and 3.
5. If the new cost is greater than the previous cost, we conclude the process and finalize the clusters otherwise
repeat step 4.

Instead of centroids as reference points in K-Means algorithms, the K-Medoids algorithm takes a Medoid as a
reference point.
Hierarchical Clustering:
Hierarchical clustering is another unsupervised machine learning algorithm, which is used to group the
unlabeled datasets into a cluster and also known as hierarchical cluster analysis or HCA. In this algorithm, we
develop the hierarchy of clusters in the form of a tree, and this tree-shaped structure is known as the
dendrogram. Sometimes the results of K-means clustering and hierarchical clustering may look similar, but they
both differ depending on how they work.
1. Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts with taking all data
points as single clusters and merging them until one cluster is left.
2. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-down approach.

Agglomerative Clustering: This is the most common type of hierarchical clustering. It starts by considering
each data point as a single cluster and then successively merges or combines the closest pairs of clusters until
only one cluster remains. The algorithm proceeds iteratively, at each stage merging the two most similar
clusters, until all data points belong to a single cluster.

The steps for agglomerative clustering are as follows:

• Start with n clusters, each containing a single data point.
• Find the two closest clusters and merge them into a single cluster.
• Repeat the previous step until only one cluster remains.

The choice of distance metric (to determine the similarity between clusters) and linkage criterion (to
determine which clusters to merge) are important decisions in agglomerative clustering.
• Calculate the distance matrix: Compute the pairwise distances between all data points. The choice of
distance metric (such as Euclidean distance, Manhattan distance, or others) depends on the nature of the data.
• Create clusters: Start by considering each data point as a single cluster.
• Merge or split clusters: For agglomerative clustering, merge the two closest clusters, and for divisive
clustering, split the cluster into smaller clusters.
• Update the distance matrix: Recalculate the distances between the new cluster(s) and the existing clusters
or data points.
• Repeat: Repeat steps 3 and 4 until only a single cluster remains (agglomerative) or until each data point is in
its own cluster (divisive).
• Dendrogram: Create a dendrogram, which is a tree-like diagram that shows the arrangement of the clusters
produced by the hierarchical clustering algorithm. The height at which branches merge in the dendrogram
represents the distance between the clusters.

The applications for hierarchical clustering is given below:

Agglomerative hierarchical clustering is a versatile clustering algorithm that can be applied to various fields
where grouping similar items together is important. Here are some suitable applications for agglomerative
hierarchical clustering:
Biology and Bioinformatics: Clustering genes or proteins based on their expression patterns or sequences can
help in understanding genetic similarities and evolutionary relationships.
Document Clustering: Grouping similar documents together based on their content can be useful in
information retrieval, topic modeling, and document organization.

Market Segmentation: Businesses can use hierarchical clustering to segment customers into different groups
based on their purchasing behaviour, preferences, or demographics.

Image Segmentation: In image processing, clustering can be used to segment images into meaningful regions
based on pixel similarities, aiding tasks like object recognition and tracking.

Social Network Analysis: Clustering users in social networks based on their interactions and interests can
reveal community structures and help in targeted marketing or content recommendation.

Anomaly Detection: Identifying outliers or anomalies in a dataset can be approached as a clustering problem.
Agglomerative hierarchical clustering can help identify clusters of normal behavior, making it easier to spot
unusual patterns.

Customer Segmentation: Businesses can use hierarchical clustering to segment their customers into different
groups based on their behaviour, purchasing patterns, and preferences. This information can be used for
targeted marketing strategies.

Speech Recognition: Clustering phonemes or speech patterns can help improve speech recognition systems by
grouping similar sounds together.

Recommendation Systems: Grouping users or items based on their preferences and behaviours can enhance
recommendation algorithms by suggesting products or content that similar users have liked.

Medicine and Healthcare: Agglomerative hierarchical clustering can be applied to medical data for patient
stratification, identifying subgroups of patients with similar disease characteristics, genetics, or treatment
responses.

Fraud Detection: Clustering credit card transactions or financial data can help in detecting unusual patterns
that might indicate fraudulent activities.

Silhouette technique/ approach:

One of the fundamental steps of an unsupervised learning algorithm is to determine the number of clusters
into which the data may be divided. The silhouette algorithm is one of the many algorithms to determine the
optimal number of clusters for an unsupervised learning technique.

In the Silhouette algorithm, we assume that the data has already been clustered into k clusters by a clustering
technique. Silhouette analysis used to check the quality of clustering model by measuring the distance
between the clusters. It basically provides us a way to assess the parameters like number of clusters with the
help of Silhouette score. This score measures how close each point in one cluster is to points in the
neighboring clusters.

Silhouette Analysis is the most common method as it is more straightforward compared to others. Silhouette
Analysis or Silhouette Plot is often used with the K-Means algorithm to measure the separation distance
between clusters. We know that K-means clustering is a simplest and popular unsupervised machine learning
algorithm. We can evaluate the algorithm by two ways. One is elbow technique and another is silhouette
method.

It exhibits the nature of the clusters formed, by how close they are within the range of [-1,1].

A silhouette score of “+1” indicates that a specific data point is distant away from its neighboring cluster and
very close to the cluster group it is assigned. In contrast, a value of “-1” indicates that the point is close to its
neighbouring cluster compared to the cluster it is assigned. As for the value of “0”, it means the data point
most likely lies at the boundary of the distance between the two clusters. Value of “+1” is the ideal score to
achieve to have a good clustering performance whereas “-1” is least preferred. However, a silhouette score of
“+1” is seemingly hard to achieve in real life, when dealing with unstructured and complex data.

The silhouette score is calculated using the mean intra-cluster distance, a, and the mean nearest-cluster
distance, b for each sample, with a condition where the number of labels to be at least larger than 2 and
smaller than the number of samples.

Silhouette coefficient =1-(a/b)=(b-a)/b.

Distance based clustering:
It is a technique where data points that are close together (based on some distance metric) are assigned to
the same cluster. The goal is to minimize the distance within clusters and maximize the distance between
clusters.

1.Euclidean Distance:
Formula: d(x,y)=((x2-x1)^2+(y2-y1)^2)^1/2

Example:
Points: A(1, 2), B(4, 6)

Distance=((4-1)^2+(6-2)^2)^1/2
=(9+16)^1/2
=5
Used in:
 K-Means
 Agglomerative Clustering
 t-SNE (for visualization)

2. Manhattan Distance (L1 Norm)

Use when:
 Your data is sparse (many 0s)
 You want less influence from outliers
 You're dealing with grid-based data (e.g., pixels or city blocks)
Example:
A(1, 2), B(4, 6) → |4-1| + |6-2| = 3 + 4 = 7
Used in:
 L1-regularized models
 Image processing
 DBSCAN (with some tuning)

🔹 3. Minkowski Distance
Use when:
 You want flexibility: it generalizes both Euclidean (p=2) and Manhattan (p=1)
 You're experimenting with different "flavors" of distance
Tuning Tip:
Try p = 1.5 or p = 3 and compare clustering results.
Used in:
 Any clustering that supports custom metrics (e.g., K-Medoids)
4. Cosine Distance
Use when:
 Your data is directional (angle matters, not magnitude)
 Mostly used in text mining or recommender systems
Example:
Two documents with similar word distribution → Small angle → High similarity
Used in:
 Text clustering (TF-IDF vectors)
 Chatbot intent grouping
 News/article topic clustering

5. Hamming Distance
Use when:
 Your data is binary or categorical
 You want to compare bitstrings, options, or labels
Example:
A = "10101", B = "10011" → Distance = 2 (only 2 bits are different)
Used in:
 DNA sequence comparison
 Spam detection
 Sensor failure detection (on/off signals)

Unit 4
No ratings yet
Unit 4
29 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
18 pages
CV Unit 4
No ratings yet
CV Unit 4
60 pages
Cluster
No ratings yet
Cluster
50 pages
Clustering
No ratings yet
Clustering
24 pages
Clustering Notes
No ratings yet
Clustering Notes
29 pages
Clustering Classification and Intro Neural Network
No ratings yet
Clustering Classification and Intro Neural Network
168 pages
Unit 4
No ratings yet
Unit 4
96 pages
Unit IV
No ratings yet
Unit IV
96 pages
ML Unit 3
No ratings yet
ML Unit 3
28 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
CS8091 Big Data Analytics MCQ
No ratings yet
CS8091 Big Data Analytics MCQ
22 pages
Unit Iv
No ratings yet
Unit Iv
12 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Clustering
No ratings yet
Clustering
10 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
40 Interview Questions Asked at Startups in Machine Learning - Data Science
100% (1)
40 Interview Questions Asked at Startups in Machine Learning - Data Science
33 pages
Unit 4
No ratings yet
Unit 4
22 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Unit 4
No ratings yet
Unit 4
53 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
DM Unit Iv
No ratings yet
DM Unit Iv
45 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
Chapter 3 p4
No ratings yet
Chapter 3 p4
18 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
22 pages
CE345 - Lecture #9 - Clustering
No ratings yet
CE345 - Lecture #9 - Clustering
56 pages
Week 11
No ratings yet
Week 11
49 pages
Unit 4
No ratings yet
Unit 4
16 pages
M5
No ratings yet
M5
40 pages
Lec.3.D. M. Spring 2025
No ratings yet
Lec.3.D. M. Spring 2025
21 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
M5
No ratings yet
M5
40 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
ML 5
No ratings yet
ML 5
61 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
Unit 4
No ratings yet
Unit 4
19 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
K Means
No ratings yet
K Means
9 pages
Unit 4
No ratings yet
Unit 4
125 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
DWDM Unit V Note
No ratings yet
DWDM Unit V Note
19 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
An Introduction To Machine Learning
No ratings yet
An Introduction To Machine Learning
136 pages
Unit 4
No ratings yet
Unit 4
74 pages
Assignment 6 ML
No ratings yet
Assignment 6 ML
4 pages
GSB Biss 2025 (Gitam University, Vizag)
No ratings yet
GSB Biss 2025 (Gitam University, Vizag)
8 pages
About The Scheme:: Atmanirbhar Bharat"
No ratings yet
About The Scheme:: Atmanirbhar Bharat"
9 pages
Apssdc Summer Internship-2025
No ratings yet
Apssdc Summer Internship-2025
20 pages
Food Kiosk: Smart Food Ordering System For Restaurants
No ratings yet
Food Kiosk: Smart Food Ordering System For Restaurants
15 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
Customer Segmentation
No ratings yet
Customer Segmentation
15 pages
Module 4 BDA NOTES
No ratings yet
Module 4 BDA NOTES
75 pages
The Power of Authentic Assessment in Higher Ed
No ratings yet
The Power of Authentic Assessment in Higher Ed
59 pages
R25 Department Vision & Mission
No ratings yet
R25 Department Vision & Mission
3 pages
A Review On K Means Clustering
No ratings yet
A Review On K Means Clustering
7 pages
Intrusion Detection Using Big Data and Deep Learning Techniques
No ratings yet
Intrusion Detection Using Big Data and Deep Learning Techniques
9 pages
On ESD UNIT 5
No ratings yet
On ESD UNIT 5
31 pages
(IJIT-V7I2P3) :CH - Kalpana, A. Sobhana Rhosaline
No ratings yet
(IJIT-V7I2P3) :CH - Kalpana, A. Sobhana Rhosaline
5 pages
Challenging Problems Time Work Speed Distance
No ratings yet
Challenging Problems Time Work Speed Distance
40 pages
Transform Warehouse Security With AI Powered Surveillance
No ratings yet
Transform Warehouse Security With AI Powered Surveillance
9 pages
Unit 5
No ratings yet
Unit 5
14 pages
Sat Class - 19
No ratings yet
Sat Class - 19
15 pages
Sat Class - 5
No ratings yet
Sat Class - 5
18 pages
ML Question BanK
No ratings yet
ML Question BanK
5 pages
Sat Class - 2
No ratings yet
Sat Class - 2
17 pages
Sat Class - 26
No ratings yet
Sat Class - 26
15 pages
Programme Credit Framework
No ratings yet
Programme Credit Framework
14 pages
Sat Class - 1
No ratings yet
Sat Class - 1
13 pages
Sat Class - 13
No ratings yet
Sat Class - 13
2 pages
Accenture
No ratings yet
Accenture
3 pages
Digital Nurture Eligible Students List
No ratings yet
Digital Nurture Eligible Students List
10 pages
Browse The Book: First-Hand Knowledge
No ratings yet
Browse The Book: First-Hand Knowledge
31 pages
Sat Class - 4
No ratings yet
Sat Class - 4
6 pages
FSD Internal Questions
No ratings yet
FSD Internal Questions
2 pages
Building, Trustworthy Generative AI Systems Brochure
No ratings yet
Building, Trustworthy Generative AI Systems Brochure
6 pages
Sat Class - 23
No ratings yet
Sat Class - 23
4 pages
Inline Image Vision Technique For Tires Industry 4.0: Quality and Defect Monitoring in Tires Assembly
No ratings yet
Inline Image Vision Technique For Tires Industry 4.0: Quality and Defect Monitoring in Tires Assembly
4 pages
DMDW Case Study Finished
No ratings yet
DMDW Case Study Finished
28 pages
Arrays
No ratings yet
Arrays
2 pages
Omniful - Ai Intern Frontend
No ratings yet
Omniful - Ai Intern Frontend
3 pages
Sat Class - 11
No ratings yet
Sat Class - 11
13 pages
Sat Class - 12
No ratings yet
Sat Class - 12
11 pages
Yeung & Ruzzo, 2001
No ratings yet
Yeung & Ruzzo, 2001
12 pages
CO-PO Mapping
No ratings yet
CO-PO Mapping
11 pages
Digital Nurture 4.0 - Qualifier Assessment
No ratings yet
Digital Nurture 4.0 - Qualifier Assessment
1 page
Arenas Et Al 2011 JGeochemExplor PXRF Mining District
No ratings yet
Arenas Et Al 2011 JGeochemExplor PXRF Mining District
6 pages
Sat Class - 14
No ratings yet
Sat Class - 14
7 pages
Sat Class - 21
No ratings yet
Sat Class - 21
4 pages
A Rapid Review of Clustering Algorithms
No ratings yet
A Rapid Review of Clustering Algorithms
25 pages
Harshith Roa T - 2209331078 - 25 - SMA Paper
No ratings yet
Harshith Roa T - 2209331078 - 25 - SMA Paper
23 pages
Using Machine Learning Techniques To Identify Rare Cyber Attacks On The UNSW NB15 Dataset
No ratings yet
Using Machine Learning Techniques To Identify Rare Cyber Attacks On The UNSW NB15 Dataset
14 pages
Notes Unit 1-3 Part-I
No ratings yet
Notes Unit 1-3 Part-I
20 pages
Paper 2
No ratings yet
Paper 2
19 pages
1-2 & 2-2 Sem Invigilations
No ratings yet
1-2 & 2-2 Sem Invigilations
1 page
Bmjopen 2016 December 6 12 Inline Supplementary Material 2
No ratings yet
Bmjopen 2016 December 6 12 Inline Supplementary Material 2
5 pages
Expert Systems With Applications: Ida Bifulco, Stefano Cirillo, Christian Esposito, Roberta Guadagni, Giuseppe Polese
No ratings yet
Expert Systems With Applications: Ida Bifulco, Stefano Cirillo, Christian Esposito, Roberta Guadagni, Giuseppe Polese
13 pages
CSE5ML 2024 SEM2 Assignment 1
No ratings yet
CSE5ML 2024 SEM2 Assignment 1
6 pages
Sih2024 Idea 16076 Sar
No ratings yet
Sih2024 Idea 16076 Sar
6 pages
Ip - Amodha Infotech - 8549932017 PDF
No ratings yet
Ip - Amodha Infotech - 8549932017 PDF
4 pages
HPC-Practical-5HPC Application For AIML Domain.
No ratings yet
HPC-Practical-5HPC Application For AIML Domain.
2 pages
Iris Segmentation Methodology For Non-Cooperative Recognition
No ratings yet
Iris Segmentation Methodology For Non-Cooperative Recognition
7 pages

Unit4 ML

Uploaded by

Unit4 ML

Uploaded by

MACHINE LEARNING

Introduction to Clustering: It is basically a type of unsupervised learning method. An unsupervised learning

• Points in the same group are as similar as possible

The working of the K-Means algorithm is explained in the below steps:

In kmeans algorithm we find the euclidiean distance:

The steps for agglomerative clustering are as follows:

The applications for hierarchical clustering is given below:

Silhouette technique/ approach:

Silhouette coefficient =1-(a/b)=(b-a)/b.

2. Manhattan Distance (L1 Norm)

You might also like