Intro Qugates
Intro Qugates
Before performing clustering, it is crucial to normalize the data, especially when features are on
different scales. For example, age might range from 18 to 70, whereas monthly_spending could
range from 50 to 500.
Standard Scaler is used here to standardize features by subtracting the mean and scaling to unit
variance, making them comparable.
Clustering Process
Feature Selection: Only a subset of features (age, tenure, monthly_spending) is selected for
clustering.
Agglomerative Clustering:
Uses linkage='ward' to minimize variance within clusters.
The n_clusters=4 parameter specifies that the data will be grouped into 4 clusters.
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import AgglomerativeClustering
import scipy.cluster.hierarchy as sch
df = pd.DataFrame(data)
# 1.3 Handle missing values by filling them with the mean of the column
df['monthly_spending'].fillna(df['monthly_spending'].mean(), inplace=True)
plt.tight_layout()
plt.show()
# 1.5 Normalize the numerical columns (age, tenure, monthly_spending, num_products) using
StandardScaler
scaler = StandardScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
# Step 2: Clustering Using Hierarchical Clustering
# 2.1 Select features for clustering (scaled data)
X = df_scaled[['age', 'tenure', 'monthly_spending']]