0% found this document useful (0 votes)

15 views4 pages

Exploring Unsupervised Learning Algorithms With The Iris Dataset

Uploaded by

ramosjohn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views4 pages

Exploring Unsupervised Learning Algorithms With The Iris Dataset

Uploaded by

ramosjohn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

General Questions of Understanding

1. Main steps in clustering with the Iris dataset:

○ Features such as normalization, scaling or removing of special characters from
the data.
○ A few clustering methods are mentioned below: –-selecting clustering algorithms
such as k in the K-means, eps and min_samples in DBSCAN.
○ Implementing the described algorithm on the worked through dataset.
○ Visualizing clusters.
○ Measuring the quality of clustering with methods such as deciding silhouette
score, or comparing the results with actual labels in case they are available.

2. Determination of clusters without predefined labels: Implemented techniques are

able to locate clusters with the help of certain patterns in the data set: local density (i.e.
DBSCAN), distance (i.e. hierarchical clustering), or statistical characteristics (i.e. GMM).
The similarities of the feature cause them to group the data points with similar
characteristics.

3. Iris dataset suitability:

○ It has a clear distinct cluster in at least the petal and the sepal measurements.
○ Small, since it is computationally possible to handle Unfortunately Weibull
distribution has some shortcomings which are as follows.
○ This is suitable for cluster evaluation and has inherent structure for reorganizing
algorithms.

K-means Clustering

1. Optimal cluster centroids: K-means cluster iteratively re-computes centroids in manner

that unlike other centroid-based algorithms, it minimizes the sum of squared distance
between the points and their centroids.

2. Significance of k and selection:

○ k stands for and is equal to the number of clusters. The choice of right k is again
very important and can be determined by methods such as the elbow method,
silhouette analysis as well as the gap statistic.
3. Handling overlapping clusters: K-means clustering standard assumption does not
allow overlapping of clusters and clusters are spherical in shape. It may wrongly map the
points within the overlapping area because it purely uses distance.

4. Effect of different initial centroids: K-means can also be sensitive with the initial
position of centroids and may end up in different results. Techniques such as the
k-means++ initialization can therefore be said to improve the consistency.

DBSCAN

1. Parameters eps and min_samples:

○ eps specifies the neighborhood range which defines point proximity.

○ min_samples defines the required number of points to form a dense area in
figures A and B.
2. Data distributions suitable for DBSCAN:

○ Overall, suitable for clustering everything from randomly shaped groups and
small noise.
○ Challenges are observed when working with datasets in particular when they
have dissimilar densities or are in high dimensionality.

3. Noise identification: DBSCAN assigns noise for points which do not belong to any
single cluster. It does not like K-means place all the points into clusters which make it
less sensitive to the outliers.

Hierarchical Clustering

1. Agglomerative vs. divisive:

○ Agglomerative: Each point begins as a cluster and is merged successively.

○ Divisive: One cluster, partition and continue this process repeatedly.
2. Linkage method effects:

○ Single linkage: Concentrates on the nearest point in the data set and might build
elongated clusters.
○ Complete linkage: Again, the farthest points are considered; final clusters are
compact.
○ Average linkage: Balances between the two.

3. Deciding the number of clusters: However, the number of clusters is a kind of

predetermined as dendrogram cuts vertically at some chosen height usually the largest
vertical distance is the height with no horizontal links.

Mean Shift Clustering

1. Cluster centroids: Mean Shift locates spots known as modes by continuously relocating
points in the direction of high point density.

2. Bandwidth parameter: Bandwidth in turn dictates the sphere of influence of each point
and has a proportional relationship with the number of clusters.

3. Handling non-spherical clusters: However, it should be noted that Mean Shift can
capture clusters of arbitrary shape, in contrast to K-means since Mean Shift is not limited
to finding clusters of a particular geometry.

Gaussian Mixture Models (GMM)

1. Data modeling: GMM models the data as a sum of several Gaussian densities giving a
probabilistic clustering method.

2. Role of EM algorithm: The Expectation-Maximization algorithm estimate and

maximizes the parameters of the Gaussian component and reiteratively assigns cluster.

3. Handling varying shapes and sizes: Another advantage of the |-GMM is in ability to
model clusters of any shape and different sizes because of its probabilistic approach and
flexibility of covariance matrices.
Comparative Analysis

1. Clustering results on Iris dataset:

○ The main drawback of the <inertial>|It should be noted that K-means may have a
problem with the issue of overlapping categories.
○ DBSCAN can discover noise but it can never plant clusters of different density in
the data they are searching.
○ Hierarchical clustering facilitates the assessment of relationships through
development of easy to understand Tree diagram.
○ Also, GMM is flexible in the shapes of the clusters.

2. Best algorithm for Iris species: The performance varies depending on the used
parameter and GMM’s main benefit might be in the matter of how well it separates
clusters because it is based on probability distribution.

3. Strengths and weaknesses:

○ K-means: Fast, easy; does not work well with clusters that are not spherical in
shape.
○ DBSCAN: Meliem solves noise issues; pose a parameter tuning concern.
○ Hierarchical: Reasonable; has high operational complexity in large datasets.
○ GMM: Flexible; but less efficient in terms of computational load needed.

Visualization and Evaluation

1. Visualization:

○ For data visualization, it is possible to use some pre-processing methods such as

PCA or t-SNE to plot the data in 2D or 3D space.
○ Subsamples of plots can be scattered by means of scatterplots with different
colors or marks.
2. Evaluation metrics:

○ Silhouette score: Defines the distance between two groups of clusters.

○ Davies-Bouldin index: Tells about cluster quality by assessing the similarity within
the clusters.
○ Adjusted Rand Index or Normalized Mutual Information: Clusters must be
compared to exclusively labeled datasets.

12P 25 Nuclear Physics and Radioactivity PDF
100% (1)
12P 25 Nuclear Physics and Radioactivity PDF
78 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
RDM Slides Clustering With R 1
No ratings yet
RDM Slides Clustering With R 1
64 pages
Clustering
No ratings yet
Clustering
55 pages
L07 Clustering Algorithms
No ratings yet
L07 Clustering Algorithms
45 pages
EML %TH Module
No ratings yet
EML %TH Module
40 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
Batch Distillation of Water-Methanol System
50% (4)
Batch Distillation of Water-Methanol System
78 pages
Density Based Clustering
No ratings yet
Density Based Clustering
70 pages
Continue: Tesla 369 Theory PDF
No ratings yet
Continue: Tesla 369 Theory PDF
3 pages
Lecture 13 - Unsupervised Learning, PCA ICA
No ratings yet
Lecture 13 - Unsupervised Learning, PCA ICA
50 pages
Data Mining Clustering Questions
No ratings yet
Data Mining Clustering Questions
18 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Clustering Analysis
No ratings yet
Clustering Analysis
12 pages
Unit 7 Clustering (P)
No ratings yet
Unit 7 Clustering (P)
22 pages
Generalized Markov Chain Monte Carlo Initialization For Clustering Gaussian Mixtures Using K-Means
No ratings yet
Generalized Markov Chain Monte Carlo Initialization For Clustering Gaussian Mixtures Using K-Means
5 pages
Cluster Analysis: Minh Tran, PHD
No ratings yet
Cluster Analysis: Minh Tran, PHD
37 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Clustering
No ratings yet
Clustering
53 pages
Clustering
No ratings yet
Clustering
14 pages
Clustering
No ratings yet
Clustering
28 pages
ML - 8
No ratings yet
ML - 8
70 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
Overview of Clustering:: UNIT-5
No ratings yet
Overview of Clustering:: UNIT-5
27 pages
Clustering Analysis (Unsupervised)
No ratings yet
Clustering Analysis (Unsupervised)
6 pages
EE769-10 Clustering
No ratings yet
EE769-10 Clustering
20 pages
Unit V Machine Learning
No ratings yet
Unit V Machine Learning
5 pages
Demystifying Clustering KMeans Agglomer
No ratings yet
Demystifying Clustering KMeans Agglomer
10 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
Unit 5
No ratings yet
Unit 5
10 pages
Clustering
No ratings yet
Clustering
65 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
CS8091 BDA Unit 2
No ratings yet
CS8091 BDA Unit 2
101 pages
Unit 4
No ratings yet
Unit 4
16 pages
AI
No ratings yet
AI
19 pages
DW&M Unit 3 Part II
No ratings yet
DW&M Unit 3 Part II
50 pages
13 Unsupervised Learning
No ratings yet
13 Unsupervised Learning
9 pages
Dbscan-Gm An Improved Clustering Method Based On Gaussian Means and Dbscan Techniques
No ratings yet
Dbscan-Gm An Improved Clustering Method Based On Gaussian Means and Dbscan Techniques
6 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
No ratings yet
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
7 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Research On K-Value Selection Method of K-Means Clustering Algorithm
No ratings yet
Research On K-Value Selection Method of K-Means Clustering Algorithm
10 pages
KMeans
No ratings yet
KMeans
2 pages
Spatial Data Mining: Clustering Techniques
No ratings yet
Spatial Data Mining: Clustering Techniques
56 pages
A Comparative Study of K-Means, DBSCAN and OPTICS
No ratings yet
A Comparative Study of K-Means, DBSCAN and OPTICS
6 pages
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
No ratings yet
Sathyabama Institute of Science and Technology SIT1301-Data Mining and Warehousing
22 pages
Railway Track
No ratings yet
Railway Track
119 pages
Exp5 - Unsupervised Learning
No ratings yet
Exp5 - Unsupervised Learning
13 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
UnsupervisedLearning FoundationalMathofAI S24
No ratings yet
UnsupervisedLearning FoundationalMathofAI S24
6 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Linear Correlation
No ratings yet
Linear Correlation
5 pages
V5I5201647
No ratings yet
V5I5201647
13 pages
Clustering
No ratings yet
Clustering
11 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Chapter 7
No ratings yet
Chapter 7
36 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
ML Exp 7
No ratings yet
ML Exp 7
6 pages
DBSCAN - Introduction in Machine Learning.
No ratings yet
DBSCAN - Introduction in Machine Learning.
3 pages
K Means Clustering Lecture
No ratings yet
K Means Clustering Lecture
32 pages
Lesson Exemplar Math Demo
No ratings yet
Lesson Exemplar Math Demo
13 pages
Bab I Pendahuluan: 1.2 Pengumpulan Data
No ratings yet
Bab I Pendahuluan: 1.2 Pengumpulan Data
6 pages
Dissertation Topics in Electronics and Communication
100% (1)
Dissertation Topics in Electronics and Communication
4 pages
Descartes
No ratings yet
Descartes
57 pages
Impact of Traffic and Road Infrastructural Design Variables On Road User Safety A Systematic Literature Review
No ratings yet
Impact of Traffic and Road Infrastructural Design Variables On Road User Safety A Systematic Literature Review
15 pages
Data Classification (Data Types)
No ratings yet
Data Classification (Data Types)
34 pages
Chapter 7 (Network Optimization Problems)
No ratings yet
Chapter 7 (Network Optimization Problems)
48 pages
Smart Highway PDF
No ratings yet
Smart Highway PDF
6 pages
MA 341 Homework 1 (Homogeneous Linear Differential Equations)
No ratings yet
MA 341 Homework 1 (Homogeneous Linear Differential Equations)
35 pages
Educ 202
No ratings yet
Educ 202
8 pages
Probability On Graphs-Geoffrey Grimmett
No ratings yet
Probability On Graphs-Geoffrey Grimmett
220 pages
5 Session Five: Counting 5.1 Session Objectives 5.2 5.3 Counting 5.4 Counting Techniques
No ratings yet
5 Session Five: Counting 5.1 Session Objectives 5.2 5.3 Counting 5.4 Counting Techniques
5 pages
Differential Calculus Lecture-1-6
No ratings yet
Differential Calculus Lecture-1-6
13 pages
Optimal Design of Power-Split Transmissions For Hydraulic Hybrid Passenger Vehicles
No ratings yet
Optimal Design of Power-Split Transmissions For Hydraulic Hybrid Passenger Vehicles
6 pages
Position Sensorless Direct Torque Control of BLDC Motor by Using Modifier
No ratings yet
Position Sensorless Direct Torque Control of BLDC Motor by Using Modifier
7 pages
Assignment 1 ENEE 3790 Modern Control Systems Analysis and Design Fall 2019 Satellite With Reaction Wheel
No ratings yet
Assignment 1 ENEE 3790 Modern Control Systems Analysis and Design Fall 2019 Satellite With Reaction Wheel
2 pages
Circuits Lab Exp 4 Report
No ratings yet
Circuits Lab Exp 4 Report
15 pages
Full Electronic Principles Albert P. Malvino PDF All Chapters
100% (2)
Full Electronic Principles Albert P. Malvino PDF All Chapters
55 pages
Network Security
No ratings yet
Network Security
14 pages
4 - Pseudocode With WHILE
No ratings yet
4 - Pseudocode With WHILE
17 pages
Costs and Their Curves
No ratings yet
Costs and Their Curves
8 pages
RRB JE 2024 Reasoning Topic Wise by Difficulty LevelEnglish
No ratings yet
RRB JE 2024 Reasoning Topic Wise by Difficulty LevelEnglish
44 pages
Case Study
No ratings yet
Case Study
2 pages
NAPLAN-Style Numeracy Year 3 - Example Test Set 3
No ratings yet
NAPLAN-Style Numeracy Year 3 - Example Test Set 3
9 pages
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet