Kmeansfinal

K-Means Clustering is an unsupervised learning algorithm that partitions data into K clusters based on similarity, using distance metrics like Euclidean Distance. The algorithm iteratively assigns data points to the nearest centroid and recalculates centroids until convergence, with applications in customer segmentation, image compression, and anomaly detection. While K-Means is efficient and versatile, it has limitations such as difficulty in determining the optimal number of clusters and sensitivity to initial centroids.

Uploaded by

quillsbot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views16 pages

Kmeansfinal

Uploaded by

quillsbot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

K-Means

ClusteringGroup-2
Thejaswi S
Samir K
Swathi G
Vatsalya k
Sruthi N
Overview of Clustering
• The task of grouping data points based on their similarity with each other is called

Clustering or Cluster Analysis. .

• Defined under Unsupervised Learning, which derives insights from unlabeled data

without a target variable.

• Forms groups of homogeneous data points from a heterogeneous dataset.

• Evaluates similarity between points using metrics such as: Euclidean Distance, Cosine

Similarity, Manhattan Distance.

Types of clustering
1.Centroid-based Clustering (Partitioning methods):
⚬ Groups data based on proximity, using metrics like Euclidean Distance.
⚬ Example algorithms: K-Means, K-Medoids.
2.Density-based Clustering (Model-based methods):
⚬ Finds clusters based on data density, automatically determining cluster size.
⚬ Example algorithm: DBSCAN.
3.Connectivity-based Clustering (Hierarchical clustering):
⚬ Builds clusters hierarchically, creating a dendrogram (tree structure).
⚬ Two approaches: Agglomerative (Bottom-Up) Divisive (Top-Down)
4.Distribution-based Clustering:
⚬ Groups data points based on statistical probability distributions.
⚬ Example: Gaussian Mixture Model
K-Means clustering:
• K-means clustering is an unsupervised machine learning algorithm used to partition a
dataset into K clusters, where each data point belongs to the cluster with the nearest
mean.
• It iteratively assigns each point to the closest cluster center, recalculates the cluster
centers, and repeats the process until convergence.
• The goal of clustering is to divide a dataset into groups (clusters) such that:
⚬ Data points within the same group are more similar to each other.
⚬ Data points from different groups are more different from each other.
• It’s about grouping data based on similarity and difference to reveal patterns or
insights in the data.
Key Concepts
• Centroids: Central points that represent the center of each cluster. They are
calculated as the mean of all points assigned to a cluster.
• Clusters: Groups of data points that are similar to each other based on proximity to a
centroid. The number of clusters is defined as K.
• Distance Metrics: Methods to calculate the similarity or dissimilarity between
points.
⚬ Euclidean Distance: A popular distance metric, calculated as the straight-line
distance between two points in space.
Algorithm Workflow:
• Step-1: Select the number K to decide the number of clusters.
• Step-2: Select random K points or centroids. (It can be other from the input dataset).
• Step-3: Assign each data point to their closest centroid, which will form the predefined
K clusters.
• Step-4: Calculate the variance and place a new centroid of each cluster.
• Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
• Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
• Step-7: The model is ready.
• Suppose we have two variables, M1 and M2, represented in the scatter plot on the right.
We aim to divide the dataset into K=2 clusters.
• To start, we randomly select two points as centroids, which are not part of the dataset.
Next, we assign each data point to its nearest centroid by calculating the distance
between the points.
• A median line is drawn between the centroids to help in this assignment.
• The center of gravity of the assigned data points is calculated to determine new
centroids.
• The assignment process is repeated, and new centroids are found.
• Data points are reassigned to the closest centroid.
• The process continues until no data points switch clusters, forming the final clusters.
• The assumed centroids are removed, and the two final clusters are formed.
Choosing the Number of Clusters (K)
• Elbow Method:
• Objective: Find the optimal number of clusters (K) by evaluating how well the clusters
fit the data.
• WCSS (Within Cluster Sum of Squares)WCSS measures the total variations within a
cluster.

• The formula calculates the sum of squared distances between each data point
(p) and its respective centroid (C₁, C₂, C₃) within each cluster.
Choosing the Number of Clusters (K)
Steps:
• Perform K-means clustering on the dataset for
different K values (typically from 1 to 10).
• Calculate WCSS for each K value.
• Plot WCSS values against the number of clusters
(K).
• Identify the "elbow" point in the graph (sharp
bend).The K value corresponding to the "elbow" is
considered optimal.
Advantages
1.Simplicity and Efficiency:
⚬ Easy to implement
⚬ Computationally efficient for large datasets
2.Scalability:
⚬ Handles large datasets well
3.Versatility:
⚬ Suitable for various data types.
⚬ Works well with well-separated clusters.
4.Flexibility:
⚬ Can be used for market segmentation, anomaly detection, etc.
Disadvantages
1.Choosing the Right K:

⚬ The optimal number of clusters (K) is hard to determine.

2.Sensitive to Initial Centroids:
⚬ The algorithm can converge to different solutions depending on the initial
centroids.
3.Assumes Spherical Clusters:
⚬ Performs poorly when clusters are non-spherical or have different sizes and
densities.
4.Sensitive to Outliers:
⚬ Outliers can significantly affect the clustering results.
Applications
1.Customer Segmentation:
￭ Grouping customers based on purchasing behavior for targeted marketing
￭ E-commerce platforms like Amazon or Flipkart use K-Means clustering to segment
customers into categories such as "frequent buyers," "occasional shoppers," and
"high-value customers."
￭ Marketing teams design personalized ads, product recommendations, and
discount strategies for each cluster.

2. Image Compression:
⚬ Image editing software compression tools (e.g., TinyPNG) use K-Means to
compress images without losing much quality.
⚬ In medical imaging, images (like X-rays) are segmented into different regions for
efficient storage and analysis, This reduces the total number of colors in the
image, saving memory and computational resources.
Applications
￭ 3. Document Clustering:
￭ To categorize a large collection of text documents into clusters/topics for easier retrieval,
management, and understanding.
￭ News organizations (e.g., BBC, Google News) use document clustering to group news articles into
topics like "sports," "politics," "technology," etc.
￭ In customer support systems, K-Means is used to group support tickets based on issue types for
faster resolution.
⚬ 4. Anomaly Detection
⚬ To identify outliers or unusual data points that do not conform to the general pattern of the dataset.
⚬ Fraud Detection: Banks and financial institutions use K-Means to identify fraudulent transactions by
flagging transactions that deviate significantly from normal patterns.
⚬ Example: Credit card purchases in unusual locations or at irregular times.
⚬ Cybersecurity: Detecting unusual user behavior, such as login attempts from suspicious locations.
Comparison with Other Clustering
Methods
Conclusion
• K-means Clustering is a powerful tool for grouping data into meaningful clusters.
• It is simple, easy to implement, and widely used in practice for tasks such as
segmentation and anomaly detection.
• Choosing K (number of clusters) is a critical step; methods like the Elbow Method can
help.
• While efficient for large datasets, K-means has limitations like sensitivity to initial
centroids and assumptions about cluster shape.
• Despite its limitations, K-means remains a go-to algorithm for unsupervised learning
and exploratory data analysis.

Ingersoll Rand System Automation X8I Modbus Rtu User's Manual
No ratings yet
Ingersoll Rand System Automation X8I Modbus Rtu User's Manual
138 pages
Newsl 2.3: Swans and Owans
No ratings yet
Newsl 2.3: Swans and Owans
3 pages
Situational Leadership Theory Proposes That Effective Leadership Requires A Rational Understanding of The Situation and An Appropriate Response
No ratings yet
Situational Leadership Theory Proposes That Effective Leadership Requires A Rational Understanding of The Situation and An Appropriate Response
6 pages
Catalogue Centrifugal Pumps 2
No ratings yet
Catalogue Centrifugal Pumps 2
54 pages
Child Friendly School S High School 1
No ratings yet
Child Friendly School S High School 1
17 pages
The Motivation and Attitudes Towards Learning Slang in English A Study of The Fourth-Year Undergr
100% (1)
The Motivation and Attitudes Towards Learning Slang in English A Study of The Fourth-Year Undergr
79 pages
Cluster
No ratings yet
Cluster
50 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
Beam Deflection - Moment Area Method PDF
No ratings yet
Beam Deflection - Moment Area Method PDF
10 pages
CH 10
No ratings yet
CH 10
22 pages
Sample Diagnostic
No ratings yet
Sample Diagnostic
29 pages
Fractionated Coconut Oil: Material Safety Data Sheet
No ratings yet
Fractionated Coconut Oil: Material Safety Data Sheet
3 pages
Machine Standard Configuration: Horizon 03ix
No ratings yet
Machine Standard Configuration: Horizon 03ix
8 pages
7.19a - Abnormal Events
No ratings yet
7.19a - Abnormal Events
10 pages
1.0 Executive Summary: Abdm3313 Entrepreneurship
No ratings yet
1.0 Executive Summary: Abdm3313 Entrepreneurship
17 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
mc34164 PDF
No ratings yet
mc34164 PDF
12 pages
Schools Division of Parañaque City Technology and Livelihood Education Electrical Installation & Maintenance 9 Quarter 4 Week 7 & 8 Wiring Diagrams
No ratings yet
Schools Division of Parañaque City Technology and Livelihood Education Electrical Installation & Maintenance 9 Quarter 4 Week 7 & 8 Wiring Diagrams
4 pages
Test 2 Answers
No ratings yet
Test 2 Answers
8 pages
Aj34 Understanding-Disciplinary-Cultures PDF
No ratings yet
Aj34 Understanding-Disciplinary-Cultures PDF
20 pages
A Study On Employees Satisfaction Towards Their Job in Seshsayee Paper and Boards Limited
No ratings yet
A Study On Employees Satisfaction Towards Their Job in Seshsayee Paper and Boards Limited
7 pages
Bangladeshi Labour Migration To The Gulf States Patterns of Recruitment
No ratings yet
Bangladeshi Labour Migration To The Gulf States Patterns of Recruitment
19 pages
1.1 Propositional Logic (EX) .4111.1534320746.8969
No ratings yet
1.1 Propositional Logic (EX) .4111.1534320746.8969
2 pages
Unit 2
No ratings yet
Unit 2
71 pages
Clustering
No ratings yet
Clustering
125 pages
Proportional Relief Valves, High Pressure: SS-4R3A
No ratings yet
Proportional Relief Valves, High Pressure: SS-4R3A
2 pages
Week 9
No ratings yet
Week 9
66 pages
4 Clustring
No ratings yet
4 Clustring
48 pages
Planned Maintenance System
No ratings yet
Planned Maintenance System
9 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
PT - Inspirasi Kreasi Sejahtera: Mulai Project P.I.C Minggu 1 Minggu 2 Minggu 3
No ratings yet
PT - Inspirasi Kreasi Sejahtera: Mulai Project P.I.C Minggu 1 Minggu 2 Minggu 3
1 page
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
21csc305p Machine Learning Unit 3 - Updated
No ratings yet
21csc305p Machine Learning Unit 3 - Updated
147 pages
Crane Telescopic
No ratings yet
Crane Telescopic
1 page
Concept Note Project
No ratings yet
Concept Note Project
3 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Proposal Title Page Sample
80% (10)
Proposal Title Page Sample
1 page
ML Unit-2
No ratings yet
ML Unit-2
31 pages
List of MCA For CSC
No ratings yet
List of MCA For CSC
9 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
EML %TH Module
No ratings yet
EML %TH Module
40 pages
Clustering Classification and Intro Neural Network
No ratings yet
Clustering Classification and Intro Neural Network
168 pages
Software Testing Quantum
No ratings yet
Software Testing Quantum
105 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Rural Devlopment Administration and Planning Quantum
No ratings yet
Rural Devlopment Administration and Planning Quantum
65 pages
03 Preprocessing
No ratings yet
03 Preprocessing
59 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
Kmea
No ratings yet
Kmea
53 pages
M5
No ratings yet
M5
40 pages
M5
No ratings yet
M5
40 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
K, Eans
No ratings yet
K, Eans
4 pages
K Means
No ratings yet
K Means
40 pages
K-Means Clustering
No ratings yet
K-Means Clustering
7 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
30 pages
Approved Inst
No ratings yet
Approved Inst
124 pages
Minor Project
No ratings yet
Minor Project
10 pages
Rural Development Notes Rural Development Notes
No ratings yet
Rural Development Notes Rural Development Notes
40 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
Class 7 Random Forest Algorithm
No ratings yet
Class 7 Random Forest Algorithm
13 pages
K Mean
No ratings yet
K Mean
7 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
K-MEANS CLUSTERING PPT Kpu
No ratings yet
K-MEANS CLUSTERING PPT Kpu
4 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
16 pages
Antiragging Affidavit Form
No ratings yet
Antiragging Affidavit Form
3 pages
Birch Clustering
No ratings yet
Birch Clustering
11 pages
ML Class1
No ratings yet
ML Class1
11 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
Algo
No ratings yet
Algo
59 pages
Clustering
No ratings yet
Clustering
67 pages
ML Module 4 Unsupervised Learning - Updated
No ratings yet
ML Module 4 Unsupervised Learning - Updated
55 pages
Web Developing Notes
No ratings yet
Web Developing Notes
4 pages
Unit 4
No ratings yet
Unit 4
22 pages
Thoits 1994 StressorsProblemSolvingIndividual
No ratings yet
Thoits 1994 StressorsProblemSolvingIndividual
19 pages
K Means Clustering
No ratings yet
K Means Clustering
13 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
INFOSYS Natural Language Processing
No ratings yet
INFOSYS Natural Language Processing
13 pages
Machine Learning Chapter 3
No ratings yet
Machine Learning Chapter 3
12 pages
Pilot
No ratings yet
Pilot
3 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
K-Means Clustering
No ratings yet
K-Means Clustering
3 pages
Unit 4
No ratings yet
Unit 4
19 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Unit 4
No ratings yet
Unit 4
125 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
KMeans Clustering Report
No ratings yet
KMeans Clustering Report
2 pages
DWDM Unit V Note
No ratings yet
DWDM Unit V Note
19 pages
Mini Project
No ratings yet
Mini Project
8 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Unit 4
No ratings yet
Unit 4
16 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet

Kmeansfinal

Uploaded by

Kmeansfinal

Uploaded by

K-Means

Clustering or Cluster Analysis. .

without a target variable.

• Forms groups of homogeneous data points from a heterogeneous dataset.

Similarity, Manhattan Distance.

⚬ The optimal number of clusters (K) is hard to determine.

You might also like