Practical 5
Practical 5
Roll No : 23
Aim: Assignment on Clustering Techniques
Download the following customer dataset from below link:
Data Set: https://www.kaggle.com/shwetabh123/mall-customers
This dataset gives the data of Income and money spent by the
customers visiting a Shopping Mall. The
data set contains Customer ID, Gender, Age, Annual Income, Spending
Score. Therefore, as a mall owner
you need to find the group of people who are the profitable
customers for the mall owner. Apply at least
two clustering algorithms (based on Spending Score) to find the
group of customers.
a. Apply Data pre-processing (Label Encoding , Data
Transformation….) techniques if
necessary. b. Perform data-preparation( Train-Test Split)
c. Apply Machine Learning Algorithm
d. Evaluate Model.
e. Apply Cross-Validation and Evaluate Model
Out[15]:
CustomerID Gender Age Annual Income (k$) Spending Score (1-100)
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40
1 of 6
In [57]: df.describe()#to describe the framework
Out[57]:
CustomerID Age Annual Income Spending Score
Out[60]: CustomerID 0
Gender 0
Age 0
Annual Income 0
Spending Score 0
dtype: int64
Out[47]: (200, 5)
Out[49]: Gender
Female 112
Male 88
Name: count, dtype: int64
2 of 6
In [82]: numeric_features_scaled = sc.fit_transform(numeric_features)#sclae the valu
3 of 6
In [99]: wcss_list = []
for i in range(1, 15):
kmeans = KMeans(n_clusters = i , init = 'k-means++' , random_state
In [101]: kmeans.fit(df_pca)
wcss_list.append(kmeans.inertia_)
# Example: Define wcss_list (replace this with your actual WCSS values)
wcss_list = [500, 300, 250, 200, 180, 175, 170, 160, 150, 145, 140, 135
# Plotting
plt.plot(range(1, len(wcss_list) + 1), wcss_list)
plt.plot([4, 4], [0, max(wcss_list)], linestyle='--', alpha=0.7) # Adjuste
plt.xlabel('K', fontdict=plt_font)
plt.ylabel('WCSS', fontdict=plt_font)
plt.title('Elbow Method for Optimal k', fontdict=plt_font)
plt.show()
4 of 6
In [30]:
5 of 6
# Plot settings
plt.title("Clustered by KMeans", fontdict={'fontsize': 14, 'fontweight'
plt.xlabel("PC1", fontdict={'fontsize': 12})
plt.ylabel("PC2", fontdict={'fontsize': 12})
plt.show()
6 of 6