Project Report Data Mining
Project Report Data Mining
WATER POTABILITY
Cluster Analysis
Shreya Singh
220700
BA Programme (CA+Maths)
AGENDA
K-Means Clustering
Agglomerative Hierarchical Clustering
DBSCAN
ABOUT DATASET - WATER POTABILITY
Access to safe drinking-water is essential to health, a basic human right and a component of
effective policy for health protection. This is important as a health and development issue at a
national, regional and local level.
The dataset is a labelled and numeric dataset and has the following columns of
information :
AGGLOMERATIVE
CLUSTERING
SILHOUETTE
SCORE
Finding the optimum number of
clusters through visualizing the
silhouette score of the dataset.
In this case, 2 is the optimum
number of clusters.
DBSCAN CLUSTERING
NEAREST
NEIGHBOUR
minpts=2*dimesions
Finding the K-Nearest
Neighbours and visualizing it.
KNEE LOCATOR
Locating the Knee Point through
Knee Locator in order to find the
value of eps.
Eps=0.61 here
DBSCAN CLUSTERING
Labelling the clusters and figuring out its composition.
VISUALIZING DBSCAN CLUSTERING
Visualizing the clusters formed by DBSCAN after reducing the dimensions by
Principal Component Analysis
VISUALIZING DBSCAN CLUSTERING
The dataset divided into 2 clusters and represented on the scatter plot.
THANK YOU!