0% found this document useful (0 votes)

6 views10 pages

23CC554

The document outlines two clustering algorithms: K-Means and Hierarchical clustering. K-Means is an unsupervised learning method for grouping data into K clusters, with advantages such as simplicity and speed, but it requires predefining K and is sensitive to outliers. Hierarchical clustering merges clusters based on minimum distances iteratively and visualizes the results using a dendrogram, allowing for flexible cluster number adjustments.

Uploaded by

23ucc554

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views10 pages

23CC554

Uploaded by

23ucc554

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

ASSIGNMENT-10

Name-Kanishq Malhotra Roll no-23UCC554

Code 1: K-Means clustering

K-Means Clustering is a popular unsupervised machine learning algorithm used

for grouping similar data points into K clusters. It’s especially useful when you
want to nd structure or patterns in your data without prede ned labels.

Pros and Cons

Pros:

• Simple and fast

• Works well on large datasets

• Easy to interpret

Cons:

• You need to choose K in advance

• Sensitive to outliers

• May converge to a local minimum (results depend on initial centroids)

Example Use Cases

• Customer segmentation

• Image compression

• Market basket analysis

• Document classi cation

fi
fi
fi
This code performs K-Means clustering on a mall customer dataset using Annual Income and
Spending Score as features. It:

1. Loads the data and extracts the relevant columns.

2. Randomly initializes k=5 centroids.

3. Iteratively assigns each point to the nearest centroid and updates centroids based on the mean
of their assigned points.

4. Repeats the above steps until convergence or a max iteration limit is reached.

5. Prints nal centroids and the number of points in each cluster.

6. Visualizes the clusters and centroids on a 2D scatter plot.

fi
# Import necessary libraries
import pandas as pd # For data manipulation
import numpy as np # For numerical computations
import matplotlib.pyplot as plt # For plotting
import random # For random number generation

# Load the dataset from CSV file

data = pd.read_csv('Mall_Customers.csv')

# Extract the relevant features (Annual Income and Spending Score) as a

NumPy array
X = data[['Annual Income (k$)', 'Spending Score (1-100)']].values

# Define the number of clusters (you can change this as needed)

k = 5

# Step 1: Randomly initialize centroids from the data points

def initialize_centroids(X, k):
centroids_idx = random.sample(range(len(X)), k) # Randomly pick k
unique indices
centroids = [X[i] for i in centroids_idx] # Select the corresponding
data points as centroids
return np.array(centroids)

# Initialize centroids
centroids = initialize_centroids(X, k)

# Function to calculate Euclidean distance between two points

def euclidean_distance(p1, p2):
return np.sqrt(np.sum((p1 - p2) ** 2)) # Standard Euclidean distance
formula

# Function to assign each data point to the nearest centroid

def assign_clusters(X, centroids):
clusters = []
for point in X:
distances = [euclidean_distance(point, centroid) for centroid in
centroids] # Distance to each centroid
clusters.append(np.argmin(distances)) # Assign to the nearest
centroid
return clusters

# Function to update centroids by calculating the mean of points in each

cluster
def update_centroids(X, clusters, k):
new_centroids = []
for i in range(k):
cluster_points = X[np.array(clusters) == i] # Get all points
assigned to cluster i
if len(cluster_points) > 0:
new_centroids.append(np.mean(cluster_points, axis=0)) #
Compute mean if cluster is not empty
else:
new_centroids.append(initialize_centroids(X, 1)[0]) #
Reinitialize empty cluster centroid
return np.array(new_centroids)

# Run the K-Means algorithm

max_iters = 100 # Maximum number of iterations
for i in range(max_iters):
clusters = assign_clusters(X, centroids) # Step 1: Assign points to
clusters
new_centroids = update_centroids(X, clusters, k) # Step 2: Update
centroids

# Check for convergence (if centroids do not change significantly)

if np.allclose(centroids, new_centroids, rtol=1e-4):
break # Stop iteration if converged
centroids = new_centroids # Update centroids for next iteration

# Print final centroids

print("Final centroids:")
print(centroids)

# Print the number of points in each cluster

for i in range(k):
cluster_size = len(X[np.array(clusters) == i]) # Count points
assigned to cluster i
print(f"Cluster {i + 1} size: {cluster_size}")

# Plot the clusters

plt.figure(figsize=(8, 6)) # Set figure size
for i in range(k):
cluster_points = X[np.array(clusters) == i] # Points in cluster i
plt.scatter(cluster_points[:, 0], cluster_points[:, 1],
label=f'Cluster {i + 1}') # Scatter plot

# Plot the centroids in red with 'x' marker

plt.scatter(centroids[:, 0], centroids[:, 1], marker='x', s=200, c='red',
label='Centroids')
plt.xlabel('Annual Income (k$)') # X-axis label
plt.ylabel('Spending Score (1-100)') # Y-axis label
plt.title('Customer Segmentation using KMeans') # Title of the plot
plt.legend() # Show legend
plt.show() # Display the plot

Output-
Final centroids:
[[ 48.16831683 43.3960396
]

[109.7 22.
]

[ 78.89285714 17.42857143]

[ 86.53846154 82.12820513]

[ 25.72727273 79.36363636]
]

Cluster 1 101
size:

Cluster 2 10
size:

Cluster 3 28
size:

Cluster 4 39
size:
Cluster 5 22
size:
Code 2: Hierarchical clustering
# Import necessary libraries
import numpy as np # For numerical computations
import matplotlib.pyplot as plt # For plotting
from scipy.cluster.hierarchy import dendrogram, linkage # For
hierarchical clustering and dendrogram
import pandas as pd # For data handling

# Load the faithful dataset (make sure the path is correct)

data = pd.read_csv('/content/faithful.csv') # Load CSV file containing
the dataset
data = data[['eruptions', 'waiting']].values # Extract only the two
relevant features as a NumPy array

# Function to calculate Euclidean distance between two points

def euclidean_distance(p1, p2):
return np.sqrt(np.sum((p1 - p2) ** 2)) # Standard Euclidean distance
formula

# Function to calculate distance between two clusters using single linkage

(minimum pairwise distance)
def cluster_distance(c1, c2):
return min([euclidean_distance(p1, p2) for p1 in c1 for p2 in c2]) #
Minimum distance between all point pairs

# Initialize: treat each point as its own cluster

clusters = [[point] for point in data]

# Set the desired number of clusters (can be changed as needed)

target_clusters = 1

# Repeat until only the target number of clusters remains

while len(clusters) > target_clusters:
min_dist = float('inf') # Initialize minimum distance to a large
value
to_merge = (None, None) # Initialize pair of clusters to merge

# Find the two closest clusters based on single linkage distance

for i in range(len(clusters)):
for j in range(i + 1, len(clusters)):
dist = cluster_distance(clusters[i], clusters[j]) # Compute
distance between cluster i and j
if dist < min_dist: # If this is the smallest so far
min_dist = dist
to_merge = (i, j) # Update the clusters to be merged

# Merge the two closest clusters

i, j = to_merge
new_cluster = clusters[i] + clusters[j] # Combine the two clusters
# Remove the merged clusters and add the new one
clusters = [clusters[x] for x in range(len(clusters)) if x not in (i,

j)]
clusters.append(new_cluster)

# Print the final clusters

print("Final clusters:")
for idx, cluster in enumerate(clusters):
print(f"Cluster {idx + 1}: {cluster}")

# Use SciPy to compute the linkage matrix for dendrogram (using single
linkage method)
Z = linkage(data, method='single')

# Plot the dendrogram to visualize the hierarchical clustering

plt.figure(figsize=(10, 6)) # Set the figure size
dendrogram(Z) # Plot the dendrogram
plt.title('Dendrogram for Faithful Dataset') # Title of the plot
plt.xlabel('Data Points') # X-axis label
plt.ylabel('Distance') # Y-axis label

# Draw a horizontal line to show a distance threshold for cutting the

dendrogram
threshold = 1.5 # Set a threshold value (can adjust this)
plt.axhline(y=threshold, color='r', linestyle='--', label='Threshold') #
Horizontal red dashed line
plt.legend() # Show legend for the threshold line

plt.show() # Display the dendrogram plot

# Visualize the final clusters in a 2D scatter plot

plt.figure(figsize=(8, 6)) # Set figure size
for idx, cluster in enumerate(clusters):
cluster_points = np.array(cluster) # Convert cluster to NumPy array
for plotting
plt.scatter(cluster_points[:, 0], # X-values (eruptions)
cluster_points[:, 1], # Y-values (waiting)
label=f'Cluster {idx + 1}') # Label for legend

plt.title('Cluster Visualization') # Plot title

plt.xlabel('eruptions') # X-axis label
plt.ylabel('waiting') # Y-axis label
plt.legend() # Show legend

plt.show() # Display the cluster visualization

CODE EXPLAINATION:

1. Load Data: It reads the dataset and extracts two features: eruptions and waiting.

2. De ne Distance Functions: It includes functions to compute Euclidean distance between

points and the minimum distance between clusters (single linkage).

3. Manual Clustering Loop: Each data point starts as its own cluster; the closest pair of
clusters are merged iteratively until only one remains (target_clusters = 1).

4. Output Clusters: It prints the nal single cluster made by merging all points (you can
modify target_clusters to stop earlier).

5. Dendrogram Plot: Uses scipy.linkage to compute the clustering steps and plots a
dendrogram to visualize cluster merges.

6. Cluster Visualization: Displays the nal cluster (or clusters, if target_clusters is

changed) in a 2D scatter plot.

Output-
Final clusters:

Cluster 1: [array([ 5.1, 96. ]), array([ 1.983, 43. ]), array([ 1.833,
57. ]), array([ 2.083, 57. ]), array([ 2.083, 57. ]), array([ 1.817,
60. ]), array([ 2.2, 60. ]), array([ 2.233, 60. ]), array([ 2.25, 60.
]), array([ 2.017, 60. ]), array([ 2.1, 60. ]), array([ 2., 58.]),
array([ 1.75, 58. ])........
fi
fi
fi

Q.1 Define The Need For Contents Selection. Enlist The Principles For Selecting The Curriculum Contents. Answer
No ratings yet
Q.1 Define The Need For Contents Selection. Enlist The Principles For Selecting The Curriculum Contents. Answer
11 pages
1.1 Read The Data and Do Exploratory Data Analysis. Describe The Data Briefly
100% (19)
1.1 Read The Data and Do Exploratory Data Analysis. Describe The Data Briefly
50 pages
Design of Marine Propulsion Shafting System For 53000 DWT Bulk Carrier
67% (3)
Design of Marine Propulsion Shafting System For 53000 DWT Bulk Carrier
10 pages
Ddos Attacks and How To Protect Against Them: Martin Oravec
No ratings yet
Ddos Attacks and How To Protect Against Them: Martin Oravec
34 pages
UNIT 5 MCQs
No ratings yet
UNIT 5 MCQs
12 pages
Warning Warning Warning: Prodigy 2.0 (M3 Unit Controller) Setup Guide
No ratings yet
Warning Warning Warning: Prodigy 2.0 (M3 Unit Controller) Setup Guide
88 pages
Startup - Fitternity
No ratings yet
Startup - Fitternity
5 pages
Shear Check As Per Codes
No ratings yet
Shear Check As Per Codes
10 pages
Usage of Permits in SAP Plant Maintenance
No ratings yet
Usage of Permits in SAP Plant Maintenance
17 pages
DLC Lab - 09
100% (1)
DLC Lab - 09
3 pages
11i How To Setup Amex Credit Cards For Use in Oracle Internet Expenses
100% (1)
11i How To Setup Amex Credit Cards For Use in Oracle Internet Expenses
28 pages
L&T Hydrocarbon Presentation September 2015
No ratings yet
L&T Hydrocarbon Presentation September 2015
27 pages
Viplav Awasthi-DataScientist
No ratings yet
Viplav Awasthi-DataScientist
6 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
22 pages
Data Validation and Verification
100% (1)
Data Validation and Verification
18 pages
Datasheet Optris XI 410
No ratings yet
Datasheet Optris XI 410
2 pages
Aiml Unit 3 4
No ratings yet
Aiml Unit 3 4
19 pages
Assessing and Responding To Risks in A Financial Statement Audit
No ratings yet
Assessing and Responding To Risks in A Financial Statement Audit
8 pages
PD80-01 SpecSheet
No ratings yet
PD80-01 SpecSheet
4 pages
Disassembly Automation Automated Systems With Cognitive Abilities (Supachai Vongbunyong, Wei Hua Chen (Auth.) ) (Z-Library)
No ratings yet
Disassembly Automation Automated Systems With Cognitive Abilities (Supachai Vongbunyong, Wei Hua Chen (Auth.) ) (Z-Library)
205 pages
Documents
No ratings yet
Documents
8 pages
Rakib Talukder
No ratings yet
Rakib Talukder
5 pages
Zara
No ratings yet
Zara
47 pages
Soal Mock Exam HCIA-Access
No ratings yet
Soal Mock Exam HCIA-Access
7 pages
Milan Er: Medium Range Weapon System For Close Combat Operations
No ratings yet
Milan Er: Medium Range Weapon System For Close Combat Operations
2 pages
Lesson 6 - Unsupervised Learning
No ratings yet
Lesson 6 - Unsupervised Learning
63 pages
2403res62 - CS564 - Assignment - 4 - K-Means-Iris - Intrinsic - CVIs
No ratings yet
2403res62 - CS564 - Assignment - 4 - K-Means-Iris - Intrinsic - CVIs
30 pages
21ai601 LM1 23 23
No ratings yet
21ai601 LM1 23 23
13 pages
STAY by The Kid LAROI, Justin Bieber Piano Letter Notes
No ratings yet
STAY by The Kid LAROI, Justin Bieber Piano Letter Notes
1 page
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Clustering Algorithms SciKit Learn 1705740354
No ratings yet
Clustering Algorithms SciKit Learn 1705740354
22 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
22k-4522 (Shozab Mehdi) Lab - 1
No ratings yet
22k-4522 (Shozab Mehdi) Lab - 1
4 pages
9536 DWM Expt 7 Merged
No ratings yet
9536 DWM Expt 7 Merged
14 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
Building K-Means Clustering Algorithm From Scratch
No ratings yet
Building K-Means Clustering Algorithm From Scratch
10 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
Collaborative Desktop Publishing
No ratings yet
Collaborative Desktop Publishing
18 pages
Compute2
No ratings yet
Compute2
10 pages
Assignment 9
No ratings yet
Assignment 9
3 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
Wa0003
No ratings yet
Wa0003
16 pages
Practical 5
No ratings yet
Practical 5
6 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Rev 32 P0003 28 August 2017
No ratings yet
Rev 32 P0003 28 August 2017
24 pages
DS - ML - 7 - 60019210046 1
No ratings yet
DS - ML - 7 - 60019210046 1
6 pages
ML Minors Exp7
No ratings yet
ML Minors Exp7
6 pages
Assg 3
No ratings yet
Assg 3
31 pages
AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
21AI71 Module 5 Textbook
No ratings yet
21AI71 Module 5 Textbook
25 pages
Prac7 8 9 10
No ratings yet
Prac7 8 9 10
12 pages
Salesforce Financial Services Cloud FSC Accredited Professional Ap Exam Dumps by Livingston 18 08 2022 9qa Braindumpscollection
No ratings yet
Salesforce Financial Services Cloud FSC Accredited Professional Ap Exam Dumps by Livingston 18 08 2022 9qa Braindumpscollection
10 pages
Najir Shaikh Practical 5 ML 2
No ratings yet
Najir Shaikh Practical 5 ML 2
4 pages
KMEANS
No ratings yet
KMEANS
5 pages
Greenheck - Effects of Screens On Louver Performance
No ratings yet
Greenheck - Effects of Screens On Louver Performance
1 page
4.cluster Analysis
No ratings yet
4.cluster Analysis
7 pages
From Import Import As Import As From Import From Import From Import From Import
No ratings yet
From Import Import As Import As From Import From Import From Import From Import
9 pages
Clustering Algorithms CheatSheet 1710438661
No ratings yet
Clustering Algorithms CheatSheet 1710438661
6 pages
K Means
No ratings yet
K Means
3 pages
Intro Qugates
No ratings yet
Intro Qugates
4 pages
Implement Clustering Algorithms For Unsupervised Classification
No ratings yet
Implement Clustering Algorithms For Unsupervised Classification
4 pages
Prac9 23bme053
No ratings yet
Prac9 23bme053
4 pages
Page Rank
No ratings yet
Page Rank
7 pages
Ass6 (DMDS)
No ratings yet
Ass6 (DMDS)
7 pages
D3 Docs
No ratings yet
D3 Docs
6 pages
Program
No ratings yet
Program
2 pages
Data Mining Ex1
No ratings yet
Data Mining Ex1
10 pages
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
Experiment 4 1
No ratings yet
Experiment 4 1
4 pages
Unit IV
No ratings yet
Unit IV
51 pages
Clustering
No ratings yet
Clustering
1 page
Drawback of Standard K-Means Algorithm
No ratings yet
Drawback of Standard K-Means Algorithm
5 pages
Marketing Analytics Week-10 LAQ
No ratings yet
Marketing Analytics Week-10 LAQ
5 pages
Week 10
No ratings yet
Week 10
84 pages
Hierarchical Clustering Mall Data
No ratings yet
Hierarchical Clustering Mall Data
2 pages
Untitled Document-2-1-13-7-11.4
No ratings yet
Untitled Document-2-1-13-7-11.4
5 pages
Week 8 DS Practical
No ratings yet
Week 8 DS Practical
13 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
AIML Lab 10
No ratings yet
AIML Lab 10
4 pages
Runleader Catalog
No ratings yet
Runleader Catalog
28 pages
Data Science Exercise Hard
No ratings yet
Data Science Exercise Hard
12 pages
Unit 3
No ratings yet
Unit 3
12 pages
Document 10
No ratings yet
Document 10
3 pages
LAB7 Kmeans
No ratings yet
LAB7 Kmeans
11 pages
Artificial Intelligence Lab 10
No ratings yet
Artificial Intelligence Lab 10
8 pages
Baidurya Debnath 4
No ratings yet
Baidurya Debnath 4
37 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

23CC554

Uploaded by

23CC554

Uploaded by

ASSIGNMENT-10

Name-Kanishq Malhotra Roll no-23UCC554

Code 1: K-Means clustering

K-Means Clustering is a popular unsupervised machine learning algorithm used

Pros and Cons

• Simple and fast

• Works well on large datasets

• You need to choose K in advance

• May converge to a local minimum (results depend on initial centroids)

Example Use Cases

• Market basket analysis

• Document classi cation

1. Loads the data and extracts the relevant columns.

2. Randomly initializes k=5 centroids.

5. Prints nal centroids and the number of points in each cluster.

6. Visualizes the clusters and centroids on a 2D scatter plot.

# Load the dataset from CSV file

# Extract the relevant features (Annual Income and Spending Score) as a

# Define the number of clusters (you can change this as needed)

# Step 1: Randomly initialize centroids from the data points

# Function to calculate Euclidean distance between two points

# Function to assign each data point to the nearest centroid

# Function to update centroids by calculating the mean of points in each

# Run the K-Means algorithm

# Check for convergence (if centroids do not change significantly)

# Print final centroids

# Print the number of points in each cluster

# Plot the clusters

# Plot the centroids in red with 'x' marker

# Load the faithful dataset (make sure the path is correct)

# Function to calculate Euclidean distance between two points

# Function to calculate distance between two clusters using single linkage

# Initialize: treat each point as its own cluster

# Set the desired number of clusters (can be changed as needed)

# Repeat until only the target number of clusters remains

# Find the two closest clusters based on single linkage distance

# Merge the two closest clusters

# Print the final clusters

# Plot the dendrogram to visualize the hierarchical clustering

# Draw a horizontal line to show a distance threshold for cutting the

plt.show() # Display the dendrogram plot

# Visualize the final clusters in a 2D scatter plot

plt.title('Cluster Visualization') # Plot title

plt.show() # Display the cluster visualization

2. De ne Distance Functions: It includes functions to compute Euclidean distance between

6. Cluster Visualization: Displays the nal cluster (or clusters, if target_clusters is

You might also like