Assignment 3.1 K Means Clustering in Python PART 1
Assignment 3.1 K Means Clustering in Python PART 1
Topics to be covered:
By the end of this activity, you’ll be able to create the following GUI in Python:
If you run the code in Python, you’ll get this output, which matches with our dataset:
Next you’ll see how to use sklearn to find the centroids for 3 clusters, and then for 4 clusters.
K-Means Clustering in Python – 3 clusters
Once you created the DataFrame based on the above data, you’ll need to import 2 additional
Python modules:
In the code below, you can specify the number of clusters. For this example, assign 3 clusters
as follows:
KMeans(n_clusters=3).fit(df)
Run the code in Python, and you’ll see 3 clusters with 3 distinct centroids:
Note that the center of each cluster (in red) represents the mean of all the observations that
belong to that cluster.
As you may also see, the observations that belong to a given cluster are closer to the center of
that cluster, in comparison to the centers of other clusters.
K-Means Clustering in Python – 4 clusters
Let’s now see what would happen if you use 4 clusters instead. In that case, the only thing that
you’ll need to do is to change the n_clusters from 3 to 4:
KMeans(n_clusters=4).fit(df)
And so, your full Python code for 4 clusters would look like this:
Run the code, and you’ll now see 4 clusters with 4 distinct centroids:
As shown in the image above the given data was entered and displayed through
importing pandas and using DataFrame.
In this python program, necessary libraries such as pandas, matplotlib and sklearn were
imported into the program. Using the given data, kmeans is defined as
KMeans(n_cluster=3).fit(df) which will result in the number of clusters that will be displayed
along with the kmeans, the centroid was also defined which are the blue colored dots in the
center of every cluster. Lastly, matplotlib as plt was called and used scatter plots to represent
the clusters and centroids.
In this python code, the number of n_cluster was changed from 3 to 4 which resulted in
creating 4 clusters with their centroids.
Reflection:
Based on my observations, the number of clusters and centroids will depend on the
number identified in n_clusters. The results above the graphs are the x and y coordinates of the
output centroids for example on the last code, the first centroid is located at 42.86 x-axis and
17.29 at y-axis.
Honor Pledge:
“I affirm that I have not given or received any unauthorized help on this assignment, and that this work is my own.”