0% found this document useful (0 votes)
26 views3 pages

Kmeans Practice

This document provides instructions for practicing k-means clustering on customer data using scikit-learn in Python. It includes steps to load and visualize the data, run k-means clustering with k=3, find the optimal k value using the elbow method by computing distortion for k from 1 to 16, and train a final k-means model on the training data using the best k.

Uploaded by

luchi lovo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views3 pages

Kmeans Practice

This document provides instructions for practicing k-means clustering on customer data using scikit-learn in Python. It includes steps to load and visualize the data, run k-means clustering with k=3, find the optimal k value using the elbow method by computing distortion for k from 1 to 16, and train a final k-means model on the training data using the best k.

Uploaded by

luchi lovo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

MSc AIBT : Machine Learning with Python

Practice 2 – kmeans
The database is available on moodle (or mail)

1) Data Visualization
a) Load the database (Customers_practice.csv).

b) Print the 10 first rows (with head function) of the dataset. Determine the size of the
examples and the number of features of the problem.

c) Display a scatter plot of the data. You should obtain the following expected result :

2) K-means algorithm
Sklearn documentation available here : https://scikit-
learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

a) Test the kmeans algorithm with k=3, with random_state=0. Use the fit() function on
your dataset. Because there is no target column, you can use all of the Data to train
your model.

1
MSc AIBT : Machine Learning with Python

b) Once the model is trained, you can access to the labels assigned to Data by kmeans
using labels_ attribute (look for the documentation to see an example of usage).
Display the distinct classes assigned by kmeans (use np.unique())
c) You can access to the centroids of the clusters using the cluster_centers_ attribute
(look for the documentation to see an example of usage). Print them.
d) Plot the scatter plot using the labels assigned by kmeans algorithm. This time plot the
points according to the label. You should obtain the following plot :

e) Explain why k=3 seems not appropriate for the correct number of clusters.
f) Find a way to plot the centroids on the plot. Be practical and create a function to plot
everything.

3) Find the optimal value of k


Find in the documentation the attribute allowing you to recover the ssd value of the trained
kmeans model.
a) Using the whole base, write a script for :
- Finding the optimal value of k using the elbow method (use the following range : [1,16[ ).

2
MSc AIBT : Machine Learning with Python

- Use the following parameters in Kmeans initialization : random_state = 42 and init=’k-


means++’.
- Draw the elbow method plot (you should obtain the following plot)

- Conclude on the best value of k.

b) Train a k-means model with the best value of k obtained before :


- random_state=42 and init=’k-means++’
- Draw the scatterplot associated
- Observe and describe the obtained clusters according to the axis (e.g. cluster 1 contains the
customers having low income but a high number of transactions)

5) More
Load the test samples (Customers_practice_test.csv).

a) Use your trained kmeans on optimal value of k (found in part 4) to predict the test
samples just loaded.
b) Print the predictions
c) Plot the decision boundaries (here is an example with k=3)

You might also like