0% found this document useful (0 votes)
12 views3 pages

Practical 03

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

Practical 03

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Dr.

Rafiq Zakaria Campus


Maulana Azad College of Arts, Science & Commerce
P.G. Dept. of Computer Science
M.Sc. III Semester
Data Mining and Warehousing
Practical 03
Date : 9th October 2024

Aim : To study Data Clustering using Python.

Description:

Data clustering is a powerful technique used to group similar data points together. Here’s a
practical guide to performing clustering using Python, specifically with the `scikit-learn` library.

1. Install Required Libraries

Make sure you have the necessary libraries installed:

pip install numpy pandas matplotlib scikit-learn

2. Import Libraries

Start by importing the required libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

3. Generate Sample Data

For this example, we’ll create synthetic data using `make_blobs`:


# Generate synthetic data
X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)

4. Visualize the Data

Visualizing the data can help understand the structure before clustering:

plt.scatter(X[:, 0], X[:, 1], s=30)


plt.title('Sample Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

5. Perform K-Means Clustering

Now, let's apply the K-Means clustering algorithm:

# Choose the number of clusters


k=4
kmeans = KMeans(n_clusters=k)
kmeans.fit(X)
# Get the cluster labels
y_kmeans = kmeans.predict(X)

6. Visualize the Clusters

You can visualize the resulting clusters:

plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=30, cmap='viridis')


centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, marker='X')
plt.title('K-Means Clustering Results')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

7. Evaluate the Clustering

Evaluate the clustering using the silhouette score:


score = silhouette_score(X, y_kmeans)
print(f'Silhouette Score: {score}')

8. Choosing the Right Number of Clusters

To choose the optimal number of clusters, you can use the Elbow method:

inertia = []
K = range(1, 11)
for k in K:
kmeans = KMeans(n_clusters=k)
kmeans.fit(X)
inertia.append(kmeans.inertia_)
plt.figure(figsize=(8, 4))
plt.plot(K, inertia, 'bx-')
plt.xlabel('Number of clusters K')
plt.ylabel('Inertia')
plt.title('Elbow Method For Optimal K')
plt.show()

Conclusion

In this practical, you learned how to perform clustering using K-Means in Python. Adjusting
parameters and preprocessing your data can yield better clustering results.

Prepared by Khan Shagufta (Assistant professor PG Dept of Comp Sci)

You might also like