0% found this document useful (0 votes)
10 views4 pages

Lab-7 Clustering

clusturing of data

Uploaded by

Ranga Timilsina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views4 pages

Lab-7 Clustering

clusturing of data

Uploaded by

Ranga Timilsina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Lab 5: Clustering with KNN

Use K means algorithm to cluster the given datapoints where K=3:

A1(2, 10), A2(2,5), A3(8, 4),


B1(5, 8), B2(7,5), B3(6, 4),
C1(1, 2), C2(4, 9)

import numpy as np
from sklearn.cluster import KMeans

# Define the data points


data_points = np.array([
[2, 10], [2, 5], [8, 4],
[5, 8], [7, 5], [6, 4],
[1, 2], [4, 9]
])

# Apply KMeans clustering with K=3


kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(data_points)

# Get the labels (which cluster each point belongs to) and centroids
labels = kmeans.labels_
centroids = kmeans.cluster_centers_

# Print the cluster labels for each data point


print("Cluster Labels:", labels)

# Print the centroids of the clusters


print("Centroids of the clusters:", centroids)

# Initialize a dictionary to store the clusters


clusters = {0: [], 1: [], 2: []}

# Organize data points into clusters


for i, label in enumerate(labels):
clusters[label].append(tuple(data_points[i]))

# Print the clusters in the requested format


for i in range(3):
print(f"Cluster {i+1}: {', '.join(map(str, clusters[i]))}")

1
#Code for viaualization

import matplotlib.pyplot as plt


plt.figure(figsize=(8, 6))

# Plot the data points with different colors for each cluster
for i in range(3):
plt.scatter([data_points[j][0] for j in range(len(data_points)) if
labels[j] == i],
[data_points[j][1] for j in range(len(data_points)) if
labels[j] == i],
label=f"Cluster {i+1}")

# Plot the centroids


plt.scatter(centroids[:, 0], centroids[:, 1], c='red', marker='x', s=200,
label="Centroids")

# Label the clusters


plt.title('K-Means Clustering (K=3) with Centroids')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()

2
Lab work:

1. Use DBSCAN algorithm to cluster the given datapoints where minpoints=3 and
Epsilon =3:
A1(2, 10), A2(2,5), A3(8, 4), B1(5, 8), B2(7,5), B3(6, 4), C1(1, 2), C2(4, 9)

2. Load a dataset (Iris or Titanic) using Pandas. Explore the dataset showing the
information, descriptive statistics, check for null values. Also visualize the
distributions of the dataset using matplotlib or seaborn.

# Task 1: Load dataset


import pandas as pd
from sklearn.datasets import load_iris
import seaborn as sns
import matplotlib.pyplot as plt
data = load_iris(as_frame=True)
df = data['frame']

# Task 2: Explore dataset


print(df.info())
print(df.describe())
print(df.isnull().sum()) # Check for missing values

# Visualize class counts in a histogram


sns.countplot(x='target', data=df, palette="viridis")
plt.title("Class Counts in Iris Dataset")
plt.xlabel("Classes (0 = Setosa, 1 = Versicolor, 2 = Virginica)")
plt.ylabel("Count")
plt.show()

3
1. You are provided with the following dataset:

Staff Age Weight Color


101 5 20 Brown
102 3 8 Black
103 2 0.5 Yellow
104 4 18 White
105 7 9 Brown

Perform the following tasks:

a) Apply one-hot encoding to Color columns.


b) Combine the encoded features with the rest of the dataset.
c) Display the resulting DataFrame.

import pandas as pd

# Create the dataset


data = {
'Staff': [101, 102, 103, 104, 105],
'Age': [5, 3, 2, 4, 7],
'Weight': [20, 8, 0.5, 18, 9],
'Color': ['Brown', 'Black', 'Yellow', 'White', 'Brown']
}
df = pd.DataFrame(data)

# Apply one-hot encoding


df_encoded = pd.get_dummies(df, columns=['Color'], dtype=int)

# Display the resulting DataFrame


print("One-Hot Encoded DataFrame:")
print(df_encoded)

You might also like