0% found this document useful (0 votes)

42 views6 pages

Vid 4

The document discusses performing K-Means clustering on a dataset using Python and R. It describes preprocessing the data, using the elbow method to determine the optimal number of clusters, applying K-Means clustering with a specified number of clusters, and visualizing the clustered data points and centroids.

Uploaded by

diyalap01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views6 pages

Vid 4

Uploaded by

diyalap01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Name: Vidya Janani V

Register Number: 913121205090

EX.NO: 4 CLUSTERING THE GIVEN DATA USING PYTHON/R

Date: 12.03.2024

AIM:

To perform clustering of the given data using K-Means in Python and R

STEPS:

1. Data Preparation: Load and pre-process the data. Ensure it's in a suitable format for
clustering

2. Library Imports: Import necessary Python libraries, such as sklearn for K-Means
and matplotlib for visualization.

3. K-Means Clustering: Initialize and fit a K-Means model, specifying the number of
clusters (K)
4. Visualization: Visualize the clusters to identify patterns and structures within the data

PYTHON:

ELBOW METHOD:

The Elbow Method to find the optimal number of clusters (K) for K-Means clustering. It
loods a dataset, selects specific features, and calculates the Within-Cluster Variance (WSS)
for Kvalues ranging from 1 to 10. The resulting WSS values are plotted to visualize the
"elbow" point where the rate of decrease in WSS slows down, indicating the optimal K. This
helps in determining the most suitable number of clusters for the given dataset.

K-MEANS

The Python code performs K-Means clustering with a specified number of clusters (K) on a
dataset with two selected features. It adds cluster assignments to the original dataset and
visualizes the data points with different colors for each cluster. Additionally, it plots the
cluster centroids. The "k" variable should be replaced with the chosen number of clusters, and
the code provides a visual representation of the clustering results.

SCATTER PLOT

1. A scatter plot will be displayed, where data points are colored differently based on their
assigned clusters, showing the clusters formed by K-Means.

2. The cluster centroids will be marked as red "x" symbols on the plot.

3. The title of the plot will indicate the number of clusters used for K-Means clustering
(specified by the 'k' variable). 4. A legend will be displayed in the upper right corner of the
plot, indicating the labels for datapoints and centroids.

Thus the k means clustering was performed for the global air pollution dataset

21PCS02 – Exploratory Data Analysis Laboratory

Name: Vidya Janani V
Register Number: 913121205090

Python Code:

1.Elbow method

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Load the dataset

data = pd.read_csv("job_placement.csv")

# Display the first few rows of the dataset

print(data.head())

# Preprocessing the data

# Dropping non-numeric columns if any and handling missing values
data = data.dropna()
numeric_data = data.select_dtypes(include=['float64', 'int64'])

# Standardizing the data

scaler = StandardScaler()
scaled_data = scaler.fit_transform(numeric_data)

# Applying PCA for dimensionality reduction

pca = PCA(n_components=2)
pca_data = pca.fit_transform(scaled_data)

# Elbow Method to find the optimal number of clusters

inertia = []
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, random_state=42)
kmeans.fit(pca_data)
inertia.append(kmeans.inertia_)

# Plotting the Elbow Method

plt.plot(range(1, 11), inertia, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
plt.show()

21PCS02 – Exploratory Data Analysis Laboratory

Name: Vidya Janani V
Register Number: 913121205090

Output

2. K-Means Clustering
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Load the dataset

data = pd.read_csv("job_placement.csv")

# Display the first few rows of the dataset

print(data.head())

# Preprocessing the data

# Dropping non-numeric columns if any and handling missing values
data = data.dropna()
numeric_data = data.select_dtypes(include=['float64', 'int64'])

# Standardizing the data

scaler = StandardScaler()
scaled_data = scaler.fit_transform(numeric_data)

# Applying PCA for dimensionality reduction

pca = PCA(n_components=2)
pca_data = pca.fit_transform(scaled_data)

21PCS02 – Exploratory Data Analysis Laboratory

Name: Vidya Janani V
Register Number: 913121205090

# Applying K-means clustering

kmeans = KMeans(n_clusters=3, random_state=42)
cluster_labels = kmeans.fit_predict(pca_data)

# Visualizing the clusters

plt.figure(figsize=(8, 6))
plt.scatter(pca_data[:, 0], pca_data[:, 1], c=cluster_labels, cmap='viridis', s=50, alpha=0.5)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='red',
marker='X', label='Centroids')
plt.title('K-means Clustering')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.show()

Output

21PCS02 – Exploratory Data Analysis Laboratory

Name: Vidya Janani V
Register Number: 913121205090

3. Scatter plot
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Load the dataset

data = pd.read_csv("job_placement.csv")

# Display the first few rows of the dataset

print(data.head())

# Preprocessing the data

# Dropping non-numeric columns if any and handling missing values
data = data.dropna()
numeric_data = data.select_dtypes(include=['float64', 'int64'])

# Standardizing the data

scaler = StandardScaler()
scaled_data = scaler.fit_transform(numeric_data)

# Applying PCA for dimensionality reduction

pca = PCA(n_components=2)
pca_data = pca.fit_transform(scaled_data)

# Applying K-means clustering

kmeans = KMeans(n_clusters=3, random_state=42)
cluster_labels = kmeans.fit_predict(pca_data)

# Visualizing the clusters

plt.figure(figsize=(10, 6))

# Plotting points with cluster centers

plt.scatter(pca_data[:, 0], pca_data[:, 1], c=cluster_labels, cmap='viridis', s=50, alpha=0.5)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='red',
marker='X', label='Centroids')

plt.title('K-means Clustering')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.grid(True)
plt.show()

21PCS02 – Exploratory Data Analysis Laboratory

Name: Vidya Janani V
Register Number: 913121205090

Output

Result:
In this experiment , Clustering the given data using Python /R was implemented and the
output was verified successfully.

21PCS02 – Exploratory Data Analysis Laboratory

Question
60% (5)
Question
33 pages
IMS Concepts and Database Administration
No ratings yet
IMS Concepts and Database Administration
12 pages
EXP-6 K Mean Clustring
No ratings yet
EXP-6 K Mean Clustring
6 pages
Pa66 ML Exp6
No ratings yet
Pa66 ML Exp6
9 pages
ML Assignment-10
No ratings yet
ML Assignment-10
5 pages
K Means Clustering - Experiment 12
No ratings yet
K Means Clustering - Experiment 12
3 pages
Pranav ML-8
No ratings yet
Pranav ML-8
4 pages
Department Of: Computer Science & Engineering
No ratings yet
Department Of: Computer Science & Engineering
4 pages
Application of Linear Algebra
No ratings yet
Application of Linear Algebra
7 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
DWM Exp7 C49
No ratings yet
DWM Exp7 C49
11 pages
AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
Practical 03
No ratings yet
Practical 03
3 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Learn Lab3
No ratings yet
Learn Lab3
12 pages
Avinash Tiwari 9
No ratings yet
Avinash Tiwari 9
4 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Ex No: Date: K-Means Clustering Using Python: Scatter
No ratings yet
Ex No: Date: K-Means Clustering Using Python: Scatter
10 pages
Building K-Means Clustering Algorithm From Scratch
No ratings yet
Building K-Means Clustering Algorithm From Scratch
10 pages
DWM Exp4
No ratings yet
DWM Exp4
9 pages
Tutorial 8
No ratings yet
Tutorial 8
12 pages
Practical 5
No ratings yet
Practical 5
6 pages
ML 2.3 Prashant
No ratings yet
ML 2.3 Prashant
4 pages
Neural Networks & Machine Learning: Worksheet 3
No ratings yet
Neural Networks & Machine Learning: Worksheet 3
3 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Elbow Method
No ratings yet
Elbow Method
2 pages
DS Manual
No ratings yet
DS Manual
30 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Da Exp 10 66
No ratings yet
Da Exp 10 66
6 pages
20bcs7635-EXP 10
No ratings yet
20bcs7635-EXP 10
5 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
Experiment 3.1 K-Mean
No ratings yet
Experiment 3.1 K-Mean
8 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
Ex No 9
No ratings yet
Ex No 9
1 page
ML Summary
No ratings yet
ML Summary
23 pages
ML SummaryFINAL
No ratings yet
ML SummaryFINAL
48 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
Unsupervisd Learning Algorithm
No ratings yet
Unsupervisd Learning Algorithm
6 pages
Week 8 R Assignment
No ratings yet
Week 8 R Assignment
17 pages
Data Science Analysis Final Project
No ratings yet
Data Science Analysis Final Project
10 pages
Exp 7 PDF
No ratings yet
Exp 7 PDF
4 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Lab Report6 - B21CI014
No ratings yet
Lab Report6 - B21CI014
8 pages
Drawback of Standard K-Means Algorithm
No ratings yet
Drawback of Standard K-Means Algorithm
5 pages
K.means Clustering
No ratings yet
K.means Clustering
8 pages
K Means
No ratings yet
K Means
5 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
Clustering 1
No ratings yet
Clustering 1
18 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
SE KMeansClustering
No ratings yet
SE KMeansClustering
21 pages
Pca&kmean
No ratings yet
Pca&kmean
6 pages
02.1 K-Means Example
No ratings yet
02.1 K-Means Example
12 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
17 pages
Datamining Lab Record
No ratings yet
Datamining Lab Record
36 pages
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet
Vidya PC Lab
No ratings yet
Vidya PC Lab
9 pages
Assvid
No ratings yet
Assvid
13 pages
Ex. No. 03 Construct An Application That Draws Basic Graphical Primitives On The Screen Date
No ratings yet
Ex. No. 03 Construct An Application That Draws Basic Graphical Primitives On The Screen Date
4 pages
Ex. No. 03 Construct An Application That Draws Basic Graphical Primitives On The Screen Date
No ratings yet
Ex. No. 03 Construct An Application That Draws Basic Graphical Primitives On The Screen Date
4 pages
Logs
No ratings yet
Logs
7 pages
CCCCCCCCCCCCC C
No ratings yet
CCCCCCCCCCCCC C
16 pages
Presented By: A Paper Presentation On
No ratings yet
Presented By: A Paper Presentation On
7 pages
LC08 L1TP 001071 20130419 20200913 02 T1 MTL
No ratings yet
LC08 L1TP 001071 20130419 20200913 02 T1 MTL
6 pages
Authorization Matrix
No ratings yet
Authorization Matrix
11 pages
Inside Out
No ratings yet
Inside Out
904 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
Data Analysis Power Bi Classnotes
No ratings yet
Data Analysis Power Bi Classnotes
4 pages
Media and Information Sources
No ratings yet
Media and Information Sources
13 pages
Rcs 201: Database DESIGN (3-Units)
No ratings yet
Rcs 201: Database DESIGN (3-Units)
39 pages
06 Handout 1
No ratings yet
06 Handout 1
5 pages
Lec-1.2 Database Management System Unit-1 BCS-501 DBMS Aktu 3rd Year Aktu Exams - Multi Atoms Plus (720p, h264, Youtube)
No ratings yet
Lec-1.2 Database Management System Unit-1 BCS-501 DBMS Aktu 3rd Year Aktu Exams - Multi Atoms Plus (720p, h264, Youtube)
3 pages
Today's Topics: Advanced Storage Systems
No ratings yet
Today's Topics: Advanced Storage Systems
8 pages
Bit Map and Join Indexing
No ratings yet
Bit Map and Join Indexing
4 pages
Back Office Use Cases For E-Commerce
No ratings yet
Back Office Use Cases For E-Commerce
5 pages
PP 90264 LIMS Integration Compliance ArabLab2017 PP90264 en
No ratings yet
PP 90264 LIMS Integration Compliance ArabLab2017 PP90264 en
25 pages
Database Concepts 6th Edition by Kroenke and Auer ISBN Test Bank
100% (46)
Database Concepts 6th Edition by Kroenke and Auer ISBN Test Bank
16 pages
When Using OSAM: - Reasons You May Want To Use OSAM Are
No ratings yet
When Using OSAM: - Reasons You May Want To Use OSAM Are
90 pages
0 Kts II Bcom - CA - Notes
No ratings yet
0 Kts II Bcom - CA - Notes
8 pages
How Your Personality Makes or Breaks Your Child 9788178063348 PDF
No ratings yet
How Your Personality Makes or Breaks Your Child 9788178063348 PDF
3 pages
DBMS Mini Project Review 1
No ratings yet
DBMS Mini Project Review 1
8 pages
Operating Systems - File-System Interface
No ratings yet
Operating Systems - File-System Interface
13 pages
EU M1 eCTD Spec v3.0.4
No ratings yet
EU M1 eCTD Spec v3.0.4
62 pages
Interview Question For Spa Team
No ratings yet
Interview Question For Spa Team
3 pages
SBVR and Business Ontology
No ratings yet
SBVR and Business Ontology
68 pages
Mra Project Milestone 2: Kirtesh Tiwari PGP - Data Science and Business Analytics - Pgpdsba Online Sep - C 2021
100% (4)
Mra Project Milestone 2: Kirtesh Tiwari PGP - Data Science and Business Analytics - Pgpdsba Online Sep - C 2021
24 pages
In This Document: How To Setup and Use AME For Purchasing (ID 434143.1)
No ratings yet
In This Document: How To Setup and Use AME For Purchasing (ID 434143.1)
10 pages
Understand Datacenters and Regions in Azure
No ratings yet
Understand Datacenters and Regions in Azure
3 pages
Assignment: Ce Marketing Research & Data Analytics
No ratings yet
Assignment: Ce Marketing Research & Data Analytics
7 pages

Vid 4

Uploaded by

Vid 4

Uploaded by

Name: Vidya Janani V

Register Number: 913121205090

EX.NO: 4 CLUSTERING THE GIVEN DATA USING PYTHON/R

To perform clustering of the given data using K-Means in Python and R

21PCS02 – Exploratory Data Analysis Laboratory

# Load the dataset

# Display the first few rows of the dataset

# Preprocessing the data

# Standardizing the data

# Applying PCA for dimensionality reduction

# Elbow Method to find the optimal number of clusters

# Plotting the Elbow Method

21PCS02 – Exploratory Data Analysis Laboratory

# Load the dataset

# Display the first few rows of the dataset

# Preprocessing the data

# Standardizing the data

# Applying PCA for dimensionality reduction

21PCS02 – Exploratory Data Analysis Laboratory

# Applying K-means clustering

# Visualizing the clusters

21PCS02 – Exploratory Data Analysis Laboratory

# Load the dataset

# Display the first few rows of the dataset

# Preprocessing the data

# Standardizing the data

# Applying PCA for dimensionality reduction

# Applying K-means clustering

# Visualizing the clusters

# Plotting points with cluster centers

21PCS02 – Exploratory Data Analysis Laboratory

21PCS02 – Exploratory Data Analysis Laboratory

You might also like