Vid 4
Vid 4
AIM:
STEPS:
1. Data Preparation: Load and pre-process the data. Ensure it's in a suitable format for
clustering
2. Library Imports: Import necessary Python libraries, such as sklearn for K-Means
and matplotlib for visualization.
3. K-Means Clustering: Initialize and fit a K-Means model, specifying the number of
clusters (K)
4. Visualization: Visualize the clusters to identify patterns and structures within the data
PYTHON:
ELBOW METHOD:
The Elbow Method to find the optimal number of clusters (K) for K-Means clustering. It
loods a dataset, selects specific features, and calculates the Within-Cluster Variance (WSS)
for Kvalues ranging from 1 to 10. The resulting WSS values are plotted to visualize the
"elbow" point where the rate of decrease in WSS slows down, indicating the optimal K. This
helps in determining the most suitable number of clusters for the given dataset.
K-MEANS
The Python code performs K-Means clustering with a specified number of clusters (K) on a
dataset with two selected features. It adds cluster assignments to the original dataset and
visualizes the data points with different colors for each cluster. Additionally, it plots the
cluster centroids. The "k" variable should be replaced with the chosen number of clusters, and
the code provides a visual representation of the clustering results.
SCATTER PLOT
1. A scatter plot will be displayed, where data points are colored differently based on their
assigned clusters, showing the clusters formed by K-Means.
2. The cluster centroids will be marked as red "x" symbols on the plot.
3. The title of the plot will indicate the number of clusters used for K-Means clustering
(specified by the 'k' variable). 4. A legend will be displayed in the upper right corner of the
plot, indicating the labels for datapoints and centroids.
Thus the k means clustering was performed for the global air pollution dataset
Python Code:
1.Elbow method
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
Output
2. K-Means Clustering
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
Output
3. Scatter plot
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
plt.title('K-means Clustering')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.grid(True)
plt.show()
Output
Result:
In this experiment , Clustering the given data using Python /R was implemented and the
output was verified successfully.