0% found this document useful (0 votes)
4 views4 pages

Lab 11 - HT

The document outlines a task to segment customers based on Age, Annual Income, and Spending Score using K-Means Clustering. The optimal number of clusters (K) identified is 3, which provides the highest silhouette score, indicating well-defined customer segments. The analysis suggests that this segmentation can enhance targeted marketing strategies for different customer groups.

Uploaded by

Lehza Jafri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views4 pages

Lab 11 - HT

The document outlines a task to segment customers based on Age, Annual Income, and Spending Score using K-Means Clustering. The optimal number of clusters (K) identified is 3, which provides the highest silhouette score, indicating well-defined customer segments. The analysis suggests that this segmentation can enhance targeted marketing strategies for different customer groups.

Uploaded by

Lehza Jafri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Lab 11 - Home tasks:

Home Task 1:
A company wants to segment its customers based on their Age, Annual Income, and Spending Score. The goal is
to group the customers into distinct segments to improve targeted marketing strategies. Your task is to apply K-
Means Clustering to segment the customers and evaluate the clustering quality using the Silhouette Score. What
is the optimal value of K in your case?

CODE:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# Step 1: Create a sample dataset


data = {
'Age': [25, 34, 22, 45, 31, 41, 38, 29, 50, 35,
21, 27, 43, 36, 33, 40, 48, 28, 24, 26],
'Annual Income (k$)': [15, 40, 22, 80, 60, 75, 50, 35, 85, 58,
20, 30, 70, 55, 45, 68, 90, 38, 25, 29],
'Spending Score': [39, 81, 6, 77, 40, 76, 50, 60, 85, 49,
10, 50, 73, 52, 39, 65, 80, 62, 14, 48]
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Step 2: Convert to array for clustering


X = df.values

# Step 3: Try different values of K and compute silhouette scores


silhouette_scores = []
K_range = range(2, 11)

print("Silhouette Scores for different K values:\n")


for k in K_range:
kmeans = KMeans(n_clusters=k, random_state=0)
labels = kmeans.fit_predict(X)
score = silhouette_score(X, labels)
silhouette_scores.append(score)
print(f"K = {k} -> Silhouette Score = {score:.4f}")
# Step 4: Plot Silhouette Score vs K
plt.figure(figsize=(8, 5))
plt.plot(K_range, silhouette_scores, marker='o', linestyle='-', color='blue')
plt.title("Silhouette Score vs Number of Clusters (K)")
plt.xlabel("Number of Clusters (K)")
plt.ylabel("Silhouette Score")
plt.grid(True)
plt.tight_layout()
plt.show()

# Step 5: Fit KMeans with optimal K (highest score)


optimal_k = K_range[silhouette_scores.index(max(silhouette_scores))]
kmeans_final = KMeans(n_clusters=optimal_k, random_state=0)
labels_final = kmeans_final.fit_predict(X)

# Step 6: Add cluster labels to DataFrame


df['Cluster'] = labels_final

# Step 7: Print final DataFrame with clusters


print("\nCustomer Segmentation Result (with Cluster Labels):\n")
print(df)

OUTPUT:
Discussion & Analysis of results
Why K=3?

 K=3 gave the highest silhouette score.


 It means the customers are best grouped into three distinct segments with minimal overlap.
 Each cluster contains customers with similar:
o Age
o Income
o Spending Score

Business Insight

 Helps the company create targeted marketing strategies for:


o Budget shoppers
o Premium customers
o Average spenders

Clustering Evaluation

 Silhouette Score ranges from -1 to 1:


o Closer to 1 → well-defined clusters
o Near 0 → overlapping clusters
o Negative → wrong clustering

Limitations

 Sensitive to outliers
 K-Means assumes spherical clusters

Conclusion
 K-Means successfully segments customers using Age, Income, Spending Score.
 Optimal K = 3 for this sample.
 Silhouette Score is a good metric to evaluate cluster quality.

You might also like