23CC554
23CC554
Pros:
• Easy to interpret
Cons:
• Sensitive to outliers
• Customer segmentation
• Image compression
3. Iteratively assigns each point to the nearest centroid and updates centroids based on the mean
of their assigned points.
4. Repeats the above steps until convergence or a max iteration limit is reached.
# Initialize centroids
centroids = initialize_centroids(X, k)
Output-
Final centroids:
[[ 48.16831683 43.3960396
]
[109.7 22.
]
[ 78.89285714 17.42857143]
[ 86.53846154 82.12820513]
[ 25.72727273 79.36363636]
]
Cluster 1 101
size:
Cluster 2 10
size:
Cluster 3 28
size:
Cluster 4 39
size:
Cluster 5 22
size:
Code 2: Hierarchical clustering
# Import necessary libraries
import numpy as np # For numerical computations
import matplotlib.pyplot as plt # For plotting
from scipy.cluster.hierarchy import dendrogram, linkage # For
hierarchical clustering and dendrogram
import pandas as pd # For data handling
j)]
clusters.append(new_cluster)
# Use SciPy to compute the linkage matrix for dendrogram (using single
linkage method)
Z = linkage(data, method='single')
1. Load Data: It reads the dataset and extracts two features: eruptions and waiting.
3. Manual Clustering Loop: Each data point starts as its own cluster; the closest pair of
clusters are merged iteratively until only one remains (target_clusters = 1).
4. Output Clusters: It prints the nal single cluster made by merging all points (you can
modify target_clusters to stop earlier).
5. Dendrogram Plot: Uses scipy.linkage to compute the clustering steps and plots a
dendrogram to visualize cluster merges.
Output-
Final clusters:
Cluster 1: [array([ 5.1, 96. ]), array([ 1.983, 43. ]), array([ 1.833,
57. ]), array([ 2.083, 57. ]), array([ 2.083, 57. ]), array([ 1.817,
60. ]), array([ 2.2, 60. ]), array([ 2.233, 60. ]), array([ 2.25, 60.
]), array([ 2.017, 60. ]), array([ 2.1, 60. ]), array([ 2., 58.]),
array([ 1.75, 58. ])........
fi
fi
fi