0% found this document useful (0 votes)
276 views4 pages

Assignment 6 ML

The document discusses the K-Means clustering algorithm. It defines K-Means clustering as an unsupervised learning technique that groups unlabeled data points into K number of clusters, where each data point belongs to the cluster with the nearest mean. The document outlines the steps of the K-Means algorithm, which iteratively assigns data points to centroids and updates the centroids until cluster membership stabilizes. It also provides a diagram illustrating how K-Means clustering works.

Uploaded by

Mansi Todmal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
276 views4 pages

Assignment 6 ML

The document discusses the K-Means clustering algorithm. It defines K-Means clustering as an unsupervised learning technique that groups unlabeled data points into K number of clusters, where each data point belongs to the cluster with the nearest mean. The document outlines the steps of the K-Means algorithm, which iteratively assigns data points to centroids and updates the centroids until cluster membership stabilizes. It also provides a diagram illustrating how K-Means clustering works.

Uploaded by

Mansi Todmal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Vidya Pratishthan’sKamalnayan Bajaj Institute of Engineering and Technology,

Baramati
Department of Computer Engineering
Assignment No:6
Roll Number: -2241072
Name of Student: - Todmal Mansi
Subject: - Machine Learning
Class: - BE Computer

Title : Implement K-Means clustering/ hierarchical clustering on sales_data_sample.csv dataset.


Determine the number of clusters using the elbow method.
Dataset link : https://www.kaggle.com/datasets/kyanyoga/sample-sales-data

 K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering problems in
machine learning or data science. In this topic, we will learn what is K-means clustering algorithm, how
the algorithm works, along with the Python implementation of k-means clustering.

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into
different clusters. Here K defines the number of pre-defined clusters that need to be created in the process,
as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each
dataset belongs only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to discover the categories of
groups in the unlabeled dataset on its own without the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and repeats
the process until it does not find the best clusters. The value of k should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the particular k-
center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other clusters.

The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each
cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

You might also like