0% found this document useful (0 votes)
103 views

Assignment 3.1 K Means Clustering in Python PART 1

This document provides an example of K-Means clustering in Python. It introduces the key steps: creating a DataFrame from sample two-dimensional data, using scikit-learn to find centroids for clusters with n_clusters set to 3 and then 4, and plotting the results. The code imports necessary libraries, fits KMeans to the DataFrame specifying the number of clusters, and uses matplotlib to visualize the clusters and centroids as scatter plots. Changing n_clusters from 3 to 4 results in 4 clusters being identified instead of 3.

Uploaded by

Paul RJ Gonzales
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views

Assignment 3.1 K Means Clustering in Python PART 1

This document provides an example of K-Means clustering in Python. It introduces the key steps: creating a DataFrame from sample two-dimensional data, using scikit-learn to find centroids for clusters with n_clusters set to 3 and then 4, and plotting the results. The code imports necessary libraries, fits KMeans to the DataFrame specifying the number of clusters, and uses matplotlib to visualize the clusters and centroids as scatter plots. Changing n_clusters from 3 to 4 results in 4 clusters being identified instead of 3.

Uploaded by

Paul RJ Gonzales
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

COLLEGE OF COMPUTER STUDIES

ITE 404 – Introduction to Data Science in Python

Name: Cerado, Dyra Jasmine Date: 03/21/23


Section: CS32S1 Program: BSCS Instructor: Ms. Paula Jean Castro-Mendoza
Assessment Task: Assignment 3.1 K-Means Clustering in Python PART 1

Example of K-Means Clustering in Python


PART 1:
K-Means Clustering is a concept that falls under Unsupervised Learning. This algorithm can be
used to find groups within unlabeled data. To demonstrate this concept, I’ll review a simple
example of K-Means Clustering in Python.

Topics to be covered:

● Creating the DataFrame for two-dimensional dataset

● Finding the centroids for 3 clusters, and then for 4 clusters

● Adding a graphical user interface (GUI) to display the results

By the end of this activity, you’ll be able to create the following GUI in Python:

Example of K-Means Clustering in Python


To start, let’s review a simple example with the following two-dimensional dataset:
You can then capture this data in Python using pandas DataFrame:

If you run the code in Python, you’ll get this output, which matches with our dataset:
Next you’ll see how to use sklearn to find the centroids for 3 clusters, and then for 4 clusters.
K-Means Clustering in Python – 3 clusters

Once you created the DataFrame based on the above data, you’ll need to import 2 additional
Python modules:

● matplotlib – for creating charts in Python

● sklearn – for applying the K-Means Clustering in Python

In the code below, you can specify the number of clusters. For this example, assign 3 clusters
as follows:

KMeans(n_clusters=3).fit(df)
Run the code in Python, and you’ll see 3 clusters with 3 distinct centroids:

Note that the center of each cluster (in red) represents the mean of all the observations that
belong to that cluster.

As you may also see, the observations that belong to a given cluster are closer to the center of
that cluster, in comparison to the centers of other clusters.
K-Means Clustering in Python – 4 clusters
Let’s now see what would happen if you use 4 clusters instead. In that case, the only thing that
you’ll need to do is to change the n_clusters from 3 to 4:

KMeans(n_clusters=4).fit(df)

And so, your full Python code for 4 clusters would look like this:

Run the code, and you’ll now see 4 clusters with 4 distinct centroids:
As shown in the image above the given data was entered and displayed through
importing pandas and using DataFrame.

In this python program, necessary libraries such as pandas, matplotlib and sklearn were
imported into the program. Using the given data, kmeans is defined as
KMeans(n_cluster=3).fit(df) which will result in the number of clusters that will be displayed
along with the kmeans, the centroid was also defined which are the blue colored dots in the
center of every cluster. Lastly, matplotlib as plt was called and used scatter plots to represent
the clusters and centroids.

In this python code, the number of n_cluster was changed from 3 to 4 which resulted in
creating 4 clusters with their centroids.

Reflection:

Based on my observations, the number of clusters and centroids will depend on the
number identified in n_clusters. The results above the graphs are the x and y coordinates of the
output centroids for example on the last code, the first centroid is located at 42.86 x-axis and
17.29 at y-axis.

Honor Pledge:
“I affirm that I have not given or received any unauthorized help on this assignment, and that this work is my own.”

You might also like