We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12
K-means Clustering
Presented By: Akanksha Kaushik Assistant Professor The NorthCap University Introduction to Unsupervised Machine Learning
• Unsupervised Machine Learning is the process of
teaching a computer to use unlabeled, unclassified data and enabling the algorithm to operate on that data without supervision.
• Without any previous data training, the machine’s job in
this case is to organize unsorted data according to parallels, patterns, and variations. Introduction to K- means Clustering
• K-Means Clustering is an Unsupervised Machine
Learning algorithm, which groups the unlabeled dataset into different clusters. • K means clustering, assigns data points to one of the K clusters depending on their distance from the center of the clusters. • It starts by randomly assigning the clusters centroid in the space. • Then each data point assign to one of the cluster based on its distance from centroid of the cluster. • After assigning each point to one of the cluster, new cluster centroids are assigned. • This process runs iteratively until it finds good cluster. • In the analysis we assume that number of cluster is given in advanced, and we must put points in one of the group. • In some cases, K is not clearly defined, and we must think about the optimal number of K. • K Means clustering performs best data is well separated. When data points overlapped this clustering is not suitable. • K Means is faster as compared to other clustering technique. • It provides strong coupling between the data points. • K Means cluster do not provide clear information regarding the quality of clusters. • Different initial assignment of cluster centroid may lead to different clusters. • Also, K Means algorithm is sensitive to noise. It may have stuck in local minima. Objective of k-means clustering • The goal of clustering is to divide the population or set of data points into a number of groups so that the data points within each group are more comparable to one another and different from the data points within the other groups. • It is essentially a grouping of things based on how similar and different they are to one another. How k-means clustering works? • We are given a data set of items, with certain features, and values for these features (like a vector). • The task is to categorize those items into groups. • To achieve this, we will use the K-means algorithm, an unsupervised learning algorithm. • ‘K’ in the name of the algorithm represents the number of groups/clusters we want to classify our items into. • The algorithm will categorize the items into k groups or clusters of similarity. • To calculate that similarity, we will use the Euclidean distance as a measurement. • The algorithm works as follows:
• First, we randomly initialize k points, called means
or cluster centroids. • We categorize each item to its closest mean, and we update the mean’s coordinates, which are the averages of the items categorized in that cluster so far. • We repeat the process for a given number of iterations and at the end, we have our clusters. • The “points” mentioned above are called means because they are the mean values of the items categorized in them. • To initialize these means, we have a lot of options. • An intuitive method is to initialize the means at random items in the data set. • Another method is to initialize the means at random values between the boundaries of the data set (if for a feature x, the items have values in [0,3], we will initialize the means with values for x at [0,3]). • The above algorithm in pseudocode is as follows: Initialize k means with random values --> For a given number of iterations:
--> Iterate through items:
--> Find the mean closest to the item by calculating
the euclidean distance of the item with each of the means
--> Assign item to mean
--> Update mean by shifting it to the average of the items in
Instant download Microsoft Azure Architect Technologies Study Companion: Hands-on Preparation and Practice for Exam AZ-300 and AZ-303 Rahul Sahay pdf all chapter