Project Report Data Mining

Data Mining Research

Uploaded by

220700

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Project Report Data Mining

Data Mining Research

Uploaded by

220700

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Project Report

WATER POTABILITY
Cluster Analysis

Shreya Singh
220700
BA Programme (CA+Maths)
AGENDA

K-Means Clustering
Agglomerative Hierarchical Clustering
DBSCAN
ABOUT DATASET - WATER POTABILITY

Access to safe drinking-water is essential to health, a basic human right and a component of
effective policy for health protection. This is important as a health and development issue at a
national, regional and local level.

The dataset is a labelled and numeric dataset and has the following columns of
information :

pH value Organic Carbons

Hardness Trihalomethanes
Chloramines Turbidity
Sulfate Potability (label)
Conductivity
** We are going to ignore/drop the column of labels and then
perform clustering.
DATASET
PREPROCESSING
NULL VALUES
Replacing NULL values with
mean of their respective
columns.
OUTLIERS
Checking the presence of
outliers through plotting boxplot
of each column.

Almost all columns have some

outliers.
OUTLIERS
Removing outliers of all columns
through Inter-Quartile Range
Method.
MINMAX SCALER
Scaling the dataset between 0
and 1 through MinMaxScaler.
K-MEANS CLUSTERING
INERTIA
Calculating Inertia for all the columns of
the dataset in order to find out the
optimum value of k for K-Means Clustering.
ELBOW METHOD
The Elbow Point gives the value
of K=2.
K-MEANS
CLUSTERING
Applying K-Means Clustering
after dropping the labelled
column.
PRINCIPAL
COMPONENT
ANALYSIS
Because of existence of multiple
columns, to represent the
clusters on 2-D scatter plot, we
apply PCA to reduce the
dimensions.
VISUALIZING
K-MEANS CLUSTER
The dataset is divided into 2
clusters and are represented in
the scatter plot.
AGGLOMERATIVE
HIERARCHICAL CLUSTERING
DENDROGRAM
Visualizing the dendrogram for
the dataset.
Visualizing the scatter plots for
2,3,4 number of clusters.

AGGLOMERATIVE
CLUSTERING
SILHOUETTE
SCORE
Finding the optimum number of
clusters through visualizing the
silhouette score of the dataset.
In this case, 2 is the optimum
number of clusters.
DBSCAN CLUSTERING
NEAREST
NEIGHBOUR
minpts=2*dimesions
Finding the K-Nearest
Neighbours and visualizing it.
KNEE LOCATOR
Locating the Knee Point through
Knee Locator in order to find the
value of eps.

Eps=0.61 here
DBSCAN CLUSTERING
Labelling the clusters and figuring out its composition.
VISUALIZING DBSCAN CLUSTERING
Visualizing the clusters formed by DBSCAN after reducing the dimensions by
Principal Component Analysis
VISUALIZING DBSCAN CLUSTERING
The dataset divided into 2 clusters and represented on the scatter plot.
THANK YOU!

Complete SQL Notes
81% (53)
Complete SQL Notes
18 pages
Distributed Facts Device for Flow Controls
From Everand
Distributed Facts Device for Flow Controls
Dr.V.V.L.N. Sastry
No ratings yet
Pentaho Data Integration Pentaho Data Integration
100% (1)
Pentaho Data Integration Pentaho Data Integration
99 pages
Report - Adaptive Beamforming
No ratings yet
Report - Adaptive Beamforming
20 pages
A Practitioners Implementation of Indicator Kriging PDF
100% (1)
A Practitioners Implementation of Indicator Kriging PDF
12 pages
ADC For Biomedical Signal
No ratings yet
ADC For Biomedical Signal
23 pages
A CXKuK-Band Precision Compact 6-Bit Digital Atten (1)
No ratings yet
A CXKuK-Band Precision Compact 6-Bit Digital Atten (1)
16 pages
Opportunistic Relay Selection With Outdated CSI: Outage Probability and Diversity Analysis
No ratings yet
Opportunistic Relay Selection With Outdated CSI: Outage Probability and Diversity Analysis
5 pages
Maximum Ratio Combining For A WCDMA Rake Receiver
No ratings yet
Maximum Ratio Combining For A WCDMA Rake Receiver
8 pages
A Robust DWT-Based Blind Data Hiding Algorithm: Liu Shi
No ratings yet
A Robust DWT-Based Blind Data Hiding Algorithm: Liu Shi
4 pages
0004 - GCAT2019 - Cross Connected Source Based Reduced Switch Count Multilevel Inverter Topology With Fault Tolerance
No ratings yet
0004 - GCAT2019 - Cross Connected Source Based Reduced Switch Count Multilevel Inverter Topology With Fault Tolerance
8 pages
Applicability of Multiple Regression Analysis For The Prediction of Pollution Performance
No ratings yet
Applicability of Multiple Regression Analysis For The Prediction of Pollution Performance
4 pages
Cel430 HW-013 QP
No ratings yet
Cel430 HW-013 QP
1 page
Phase-Orthogonality CDSK: A Reliable and Effective Chaotic Communication Scheme
No ratings yet
Phase-Orthogonality CDSK: A Reliable and Effective Chaotic Communication Scheme
7 pages
SIGTRANbook
No ratings yet
SIGTRANbook
101 pages
Cramér-Rao Bound For Circular Complex Independent Component Analysis - Springer
No ratings yet
Cramér-Rao Bound For Circular Complex Independent Component Analysis - Springer
10 pages
1 Isc2011 1
No ratings yet
1 Isc2011 1
29 pages
Digital Filter in Hardware Loop
No ratings yet
Digital Filter in Hardware Loop
5 pages
Final Report 1
No ratings yet
Final Report 1
61 pages
EC6801 Wireless Communication PDF
No ratings yet
EC6801 Wireless Communication PDF
22 pages
EE8711-Power System Simulation Lab Manual
No ratings yet
EE8711-Power System Simulation Lab Manual
162 pages
ST CIE 2_Jan 29th Scheme
No ratings yet
ST CIE 2_Jan 29th Scheme
5 pages
A New Low Leakage Power Flip-Flop Based On Ratioed
No ratings yet
A New Low Leakage Power Flip-Flop Based On Ratioed
7 pages
EE8711-Power System Simulation Lab Manual
No ratings yet
EE8711-Power System Simulation Lab Manual
155 pages
LCL Filter Design and Performance Analysis For Grid Interconnected Systems
No ratings yet
LCL Filter Design and Performance Analysis For Grid Interconnected Systems
7 pages
1905611-power-system-simulation-lab-manual (1)
No ratings yet
1905611-power-system-simulation-lab-manual (1)
187 pages
1905611 Power System Simulation Lab Manual (1) 1
No ratings yet
1905611 Power System Simulation Lab Manual (1) 1
222 pages
Jansons Institute of Technology: Model Exam
No ratings yet
Jansons Institute of Technology: Model Exam
4 pages
2015 Dec Ijast Sukhraj
No ratings yet
2015 Dec Ijast Sukhraj
12 pages
Network Models II: CS109/Stat121/AC209/E-109 Data Science
No ratings yet
Network Models II: CS109/Stat121/AC209/E-109 Data Science
19 pages
1 Isc2011 1
No ratings yet
1 Isc2011 1
29 pages
A New Class of Dual-Mode Substrate Integrated Waveguide (SIW) Filter With Two Metalized Posts. 2
No ratings yet
A New Class of Dual-Mode Substrate Integrated Waveguide (SIW) Filter With Two Metalized Posts. 2
8 pages
Energy Selection Combining For
No ratings yet
Energy Selection Combining For
10 pages
Li_Schur2020_2
No ratings yet
Li_Schur2020_2
15 pages
Fauqueur Icip06 DTCWT
No ratings yet
Fauqueur Icip06 DTCWT
4 pages
KP Report
No ratings yet
KP Report
33 pages
Phase Angles and Time Delays: AN113 Dataforth Corporation Page 1 of 4
No ratings yet
Phase Angles and Time Delays: AN113 Dataforth Corporation Page 1 of 4
4 pages
C3-ISITES2018ID13 Yaynlanan
No ratings yet
C3-ISITES2018ID13 Yaynlanan
10 pages
Schematics Iphone 6s+ Alpha2000
No ratings yet
Schematics Iphone 6s+ Alpha2000
62 pages
Differential_CMOS_Low_Noise_Amplifier_Design_for_W
No ratings yet
Differential_CMOS_Low_Noise_Amplifier_Design_for_W
8 pages
Unsupervised Speaker Change Detection For Broadcast News Segmentation
No ratings yet
Unsupervised Speaker Change Detection For Broadcast News Segmentation
5 pages
A 16-Bit 4 MSPS DAC For Lock-In Amplifier in 65nm CMOS
No ratings yet
A 16-Bit 4 MSPS DAC For Lock-In Amplifier in 65nm CMOS
5 pages
DS IAT - 1 Answerkey
No ratings yet
DS IAT - 1 Answerkey
20 pages
Selection Diversity Receivers Over Nonidentical Weibull Fading Channels
No ratings yet
Selection Diversity Receivers Over Nonidentical Weibull Fading Channels
6 pages
DSH R853600 en
No ratings yet
DSH R853600 en
2 pages
Syllabus Analog-And-Digital-Integrated-Circuit
No ratings yet
Syllabus Analog-And-Digital-Integrated-Circuit
2 pages
1 514247184653221903
No ratings yet
1 514247184653221903
61 pages
PHD 11390
No ratings yet
PHD 11390
18 pages
Gowda and TV Ravi
No ratings yet
Gowda and TV Ravi
6 pages
17 551 revJA - Revauth
No ratings yet
17 551 revJA - Revauth
10 pages
Analysis of Darlington Pair in Distributed Amplifier Circuit: April 2015
No ratings yet
Analysis of Darlington Pair in Distributed Amplifier Circuit: April 2015
5 pages
Study of Polar Code
No ratings yet
Study of Polar Code
3 pages
Description of Limit Cycles in Sigma-Delta Modulators: Derk Reefman, Josh Reiss, Erwin Janssen, and Mark Sandler
No ratings yet
Description of Limit Cycles in Sigma-Delta Modulators: Derk Reefman, Josh Reiss, Erwin Janssen, and Mark Sandler
13 pages
Autocorrelation Descriptor Improvements For QSAR - 2DA - Sign and 3DA - Sign
No ratings yet
Autocorrelation Descriptor Improvements For QSAR - 2DA - Sign and 3DA - Sign
9 pages
Carrier Phase Tracking Considerations For Commodity SDR Hardware
No ratings yet
Carrier Phase Tracking Considerations For Commodity SDR Hardware
16 pages
2023 Analysis and Application of Multispectral Data For Water Segmentation Using Machine Learning
No ratings yet
2023 Analysis and Application of Multispectral Data For Water Segmentation Using Machine Learning
10 pages
LCL Filter Design
No ratings yet
LCL Filter Design
8 pages
Iphone6s Plus 5
No ratings yet
Iphone6s Plus 5
60 pages
Malta Presentation
No ratings yet
Malta Presentation
36 pages
8 Bits Barrel Shifter
No ratings yet
8 Bits Barrel Shifter
4 pages
Apple Iphone 6S N66 Schematics PDF
No ratings yet
Apple Iphone 6S N66 Schematics PDF
60 pages
Distribution of Electrical Power: Lecture Notes of Distribution of Electric Power Course
From Everand
Distribution of Electrical Power: Lecture Notes of Distribution of Electric Power Course
Dr. Hidaia Mahmood Alassouli
No ratings yet
ENGLISH_DEBATING_AND_DECLAMATION_SOCIETY_MoM_2023-24
No ratings yet
ENGLISH_DEBATING_AND_DECLAMATION_SOCIETY_MoM_2023-24
4 pages
22032501179 Shreya Singh
No ratings yet
22032501179 Shreya Singh
10 pages
The Humsafar Trust
No ratings yet
The Humsafar Trust
2 pages
DBMS Practical File
No ratings yet
DBMS Practical File
12 pages
VAC Assignment
No ratings yet
VAC Assignment
2 pages
Asgn 1 Maths
No ratings yet
Asgn 1 Maths
16 pages
5 LD Registration Form
No ratings yet
5 LD Registration Form
4 pages
(Ebook - PDF) Microsoft Access Tutorial
No ratings yet
(Ebook - PDF) Microsoft Access Tutorial
95 pages
N-Gram in NLP
No ratings yet
N-Gram in NLP
15 pages
DBMS Accessing SQL DBA Assessment
No ratings yet
DBMS Accessing SQL DBA Assessment
1 page
Chapter II Business Management Tools 1
No ratings yet
Chapter II Business Management Tools 1
17 pages
CS403 Courtney Paradice Database Systems For Management CCBY Chapter1
100% (1)
CS403 Courtney Paradice Database Systems For Management CCBY Chapter1
31 pages
Devices of Computer and Functions
No ratings yet
Devices of Computer and Functions
1 page
Event Management System Project Group-3
No ratings yet
Event Management System Project Group-3
38 pages
R19 DBMS Material
No ratings yet
R19 DBMS Material
207 pages
Student Clustering Based On Academic Using K-Means Algorithma
No ratings yet
Student Clustering Based On Academic Using K-Means Algorithma
34 pages
Olszak
No ratings yet
Olszak
11 pages
Mis Project
No ratings yet
Mis Project
11 pages
OPAC (Online Public Access Catalog)
No ratings yet
OPAC (Online Public Access Catalog)
16 pages
Independant Learning 2023-24 EVEN SEM-2025
No ratings yet
Independant Learning 2023-24 EVEN SEM-2025
3 pages
Azure Data Engineer - Updated Profile - Raaman
No ratings yet
Azure Data Engineer - Updated Profile - Raaman
4 pages
Literature Review On Big Data
No ratings yet
Literature Review On Big Data
10 pages
John - Fields - HW1 Data Mining
No ratings yet
John - Fields - HW1 Data Mining
10 pages
Ai for The Future of Privacy Part 2
No ratings yet
Ai for The Future of Privacy Part 2
2 pages
PIC18F
No ratings yet
PIC18F
37 pages
Features of DBMS
No ratings yet
Features of DBMS
4 pages
Ei Compendex
No ratings yet
Ei Compendex
2 pages
Leet Code
No ratings yet
Leet Code
2 pages
Download Full (Ebook) Database Systems: Design, Implementation, & Management by Carlos Coronel, Steven Morris ISBN 9781305627482, 1305627482 PDF All Chapters
100% (9)
Download Full (Ebook) Database Systems: Design, Implementation, & Management by Carlos Coronel, Steven Morris ISBN 9781305627482, 1305627482 PDF All Chapters
67 pages
Hands On Lab Guide For Data Lake PDF
No ratings yet
Hands On Lab Guide For Data Lake PDF
19 pages
Darshan Institute of Engineering & Technology
No ratings yet
Darshan Institute of Engineering & Technology
12 pages
PLSQL Triggers
100% (1)
PLSQL Triggers
3 pages
Prisma-Scr: Item 5: Protocol and Registration
No ratings yet
Prisma-Scr: Item 5: Protocol and Registration
1 page
Enterprise Wide Data Warehouse
No ratings yet
Enterprise Wide Data Warehouse
4 pages