0% found this document useful (0 votes)
25 views3 pages

IMP Hierarchical Clustering

The document outlines a Python script for performing hierarchical and agglomerative clustering on a dataset containing individuals' ages and incomes. It includes data preprocessing steps, linkage matrix creation, and visualization of dendrograms, as well as the calculation of silhouette scores for clustering performance evaluation. The results show cluster labels and cophenetic correlation coefficients for both agglomerative and divisive clustering methods.

Uploaded by

hetvibhora192
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views3 pages

IMP Hierarchical Clustering

The document outlines a Python script for performing hierarchical and agglomerative clustering on a dataset containing individuals' ages and incomes. It includes data preprocessing steps, linkage matrix creation, and visualization of dendrograms, as well as the calculation of silhouette scores for clustering performance evaluation. The results show cluster labels and cophenetic correlation coefficients for both agglomerative and divisive clustering methods.

Uploaded by

hetvibhora192
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

ew1tg5dzo

December 22, 2024

[1]: import numpy as np


import pandas as pd
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster, cophenet
from scipy.spatial.distance import pdist
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import silhouette_score

[2]: # Sample dataset (replace this with your dataset)


df = pd.read_csv("income.csv")
df.head()

[2]: Name Age Income($)


0 Rob 27 70000
1 Michael 29 90000
2 Mohan 29 61000
3 Ismail 28 60000
4 Kory 42 150000

[3]: df = df.drop(['Name'], axis = 1)


df.head()

[3]: Age Income($)


0 27 70000
1 29 90000
2 29 61000
3 28 60000
4 42 150000

[10]: # from scipy.cluster.hierarchy import dendrogram, linkage

linkage_matrix = linkage(df, method='ward') # 'ward', 'single', 'complete',␣


↪'average'

dendrogram(linkage_matrix)
plt.show()

1
[5]: # # Set a threshold for clustering (e.g., distance = 50)
# threshold = 50
# clusters = fcluster(linkage_matrix, t=threshold, criterion='distance')

# # Add cluster labels to the dataset


# df['Cluster'] = clusters

# # Display the dataset with clusters


# print(df)

[6]: # from scipy.cluster.hierarchy import cophenet


# from scipy.spatial.distance import pdist

# coph_corr, _ = cophenet(linkage_matrix, pdist(df))


# print(coph_corr)

[7]: # Agglomerative Clustering


agglo = AgglomerativeClustering(n_clusters=2, linkage='ward')
labels_agglo = agglo.fit_predict(df)
silhouette_agglo = silhouette_score(df, labels_agglo)
print("Agglomerative Clustering Labels:", labels_agglo)

2
print("Agglomerative Silhouette Score:", silhouette_agglo)

Agglomerative Clustering Labels: [0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0]


Agglomerative Silhouette Score: 0.8191238627089519

[8]: Z = linkage(df, method='ward')


max_clusters = 3
labels_divisive = fcluster(Z, t=max_clusters, criterion='maxclust')

coph_corr, _ = cophenet(Z, pdist(df)) # Cophenetic Correlation


print("Divisive Clustering Labels:", labels_divisive)
print("Divisive Cophenetic Correlation Coefficient:", coph_corr)

Divisive Clustering Labels: [3 2 3 3 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 2 2 3]


Divisive Cophenetic Correlation Coefficient: 0.9472115279959762

[8]:

You might also like