0% found this document useful (0 votes)

26 views4 pages

Unit IV Unsupervised Learning

Uploaded by

apdeshmukh371122

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views4 pages

Unit IV Unsupervised Learning

Uploaded by

apdeshmukh371122

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Unit IV Unsupervised Learning

1. What do you mean by Unsupervised Learning?

Unsupervised learning is a type of machine learning where the algorithm is trained on data that has
no labelled outcomes or target variables. The goal is to identify patterns, structures, or relationships
in the data without explicit supervision. In unsupervised learning, the algorithm tries to group data
points (clustering), reduce dimensions (dimensionality reduction), or discover hidden structures
(association rules) based on the inherent patterns in the data.

Common unsupervised learning techniques include:

 Clustering (e.g., K-means, hierarchical clustering)

 Dimensionality Reduction (e.g., PCA, t-SNE)

 Association Rule Learning (e.g., Apriori, Eclat)

2. What do you mean by Cluster Analysis?

Cluster analysis is a technique used in data analysis to group similar data points together based on
certain characteristics or features. The goal is to identify patterns or structures in data, where data
points within the same group (or cluster) are more similar to each other than to those in other
groups. It's widely used in fields like market research, pattern recognition, and machine learning.
Common clustering algorithms include K-means, hierarchical clustering, and DBSCAN.

3. What are the Partition Methods (K-Means, K-Medoids)?

Partition methods are clustering techniques that divide a dataset into non-overlapping groups (or
clusters) such that each data point belongs to exactly one cluster. These methods require the user
to specify the desired number of clusters (K) in advance. The goal is to optimize a specific criterion,
like minimizing the variance within clusters or maximizing the distance between clusters.

K-Means: A clustering algorithm that partitions data into a predefined number of clusters by
minimizing the variance within each cluster.

K-Medoids: Similar to K-means, but instead of using the mean of the points in a cluster, it uses an
actual data point (medoid) to represent the cluster.

K-Means:

 Method: This algorithm starts by selecting K initial centroids (randomly or using some
strategy). Each data point is assigned to the nearest centroid, forming K clusters. Then, the
centroids are recalculated as the mean of the points in each cluster. The process repeats
until the centroids no longer change significantly.

 Pros: Efficient for large datasets, works well when clusters are spherical and equally sized.
 Cons: Sensitive to initial centroid placement, struggles with non-spherical or unevenly sized
clusters.

K-Medoids:

 Method: Like K-means, K-medoids aims to partition the data into K clusters, but it uses actual
points (medoids) as cluster centers. Instead of minimizing variance, it minimizes the total
pairwise dissimilarity between points in the cluster and their medoid. Algorithms like
Partitioning Around Medoids (PAM) are often used.

 Pros: More robust to outliers since the medoid is an actual data point and not an average.

 Cons: Computationally more expensive than K-means, especially for large datasets.

4. What are the Hierarchical Methods: Agglomerative and Divisive Hierarchical Clustering

Agglomerative Hierarchical Clustering: A bottom-up approach where each data point starts as its
own cluster, and clusters are progressively merged based on similarity.

Divisive Hierarchical Clustering: A top-down approach where all data points start in one cluster, and
the cluster is recursively split into smaller clusters.

Agglomerative Hierarchical Clustering:

 Method: In this approach, each data point is initially treated as its own cluster. The algorithm
then iteratively merges the closest clusters based on a distance metric (e.g., Euclidean
distance). This process continues until all points belong to a single cluster or a stopping
criterion is met. A dendrogram (tree-like diagram) is often used to visualize the merges.

 Pros: Simple to implement, no need to specify the number of clusters in advance.

 Cons: Can be computationally expensive for large datasets (O(n²) complexity).

Divisive Hierarchical Clustering:

 Method: This is the opposite of agglomerative clustering. It starts with all data points in one
single cluster and then recursively splits the cluster into smaller clusters. The splits are based
on maximizing the dissimilarity between resulting clusters. Like agglomerative clustering, a
dendrogram can be used to visualize the process.

 Pros: Useful for finding a global structure in the data, can work better for certain types of
data than agglomerative.

 Cons: More computationally expensive than agglomerative clustering, especially with large
datasets.

5. Dynamic Clustering, Multi-view Clustering

Dynamic Clustering: A clustering method that adapts to changes in data over time, updating clusters
as new data points become available.

Multi-view Clustering: A clustering approach that combines information from multiple sources or
views to create more robust clusters.
Dynamic Clustering:

 Method: Dynamic clustering adjusts clusters over time to reflect changes in the data. It is
commonly used in situations where data evolves, such as in streaming data or when periodic
updates are needed. Algorithms for dynamic clustering continuously adjust cluster members
or centers as new data is added, ensuring that the clustering structure adapts to emerging
patterns or trends.

 Pros: Ideal for real-time data or systems where data continuously changes (e.g., sensor data,
online platforms).

 Cons: Computationally intensive in real-time environments, especially when large data

volumes need to be processed regularly.

Multi-view Clustering:

 Method: This technique involves combining multiple data representations (or "views") of the
same dataset to create a more accurate and comprehensive clustering solution. Each view
represents different perspectives or feature sets (e.g., text, images, or graph-based data),
and the algorithm seeks to find clusters that are consistent across these views. By fusing
information from these multiple sources, multi-view clustering can improve the quality and
robustness of the clustering outcome.

 Pros: Can leverage complementary information from different sources, improving clustering
accuracy in complex datasets (e.g., multimodal data such as images and text).

 Cons: Requires multiple data views, which may not always be available or easy to combine;
computational complexity can be high when integrating diverse data sources.

6. Measuring Clustering Quality.

Internal Evaluation Metrics: Measures the quality of clustering based on the data and the clustering
itself, without reference to external ground truth.

External Evaluation Metrics: Compares the clustering results to an external, known classification
(ground truth) to measure its accuracy.

Internal Evaluation Metrics:

 Method: Internal metrics assess the quality of clustering based solely on the data and the
structure of the clusters without external information. They typically focus on cohesion (how
close the points within a cluster are) and separation (how distinct the clusters are from each
other). Common internal metrics include:

o Silhouette Score: Measures how similar a point is to its own cluster compared to
other clusters.

o Davies-Bouldin Index: Evaluates the average similarity ratio of each cluster with the
cluster that is most similar to it.
o Dunn Index: Measures the ratio of the minimum inter-cluster distance to the
maximum intra-cluster distance.

 Pros: Useful when no ground truth is available. Helps in fine-tuning clustering algorithms.

 Cons: May not fully capture the true quality of clusters, especially in ambiguous cases where
ground truth is unknown.

External Evaluation Metrics:

 Method: External metrics compare clustering results to predefined, external classifications

(ground truth) to evaluate how well the clustering matches the actual data distribution.
Common external metrics include:

o Rand Index: Measures the percentage of pairs of data points that are either in the
same cluster or in different clusters in both the predicted and actual classifications.

o Adjusted Rand Index (ARI): A variation of the Rand Index that adjusts for chance
groupings, giving a more accurate measure.

o Normalized Mutual Information (NMI): Measures the amount of information shared

between the clustering results and the ground truth

 Pros: Provides a clear, quantitative evaluation when ground truth is available, making it
easier to compare clustering algorithms.

 Cons: Requires the availability of ground truth labels, which may not always be present or
reliable.

Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
ML Tennis
No ratings yet
ML Tennis
6 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Clustering
No ratings yet
Clustering
11 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
21 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
Dmaclat4 Merged
No ratings yet
Dmaclat4 Merged
46 pages
Unit 5
No ratings yet
Unit 5
10 pages
clustering
No ratings yet
clustering
6 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Asynchronous Task Cluster Analysis
No ratings yet
Asynchronous Task Cluster Analysis
2 pages
unit5_CSM_ML
No ratings yet
unit5_CSM_ML
32 pages
Gautam A. Kudale
No ratings yet
Gautam A. Kudale
6 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
Clustering new
No ratings yet
Clustering new
6 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
DWM UNIT-5 SEM ANS
No ratings yet
DWM UNIT-5 SEM ANS
8 pages
Clustering
No ratings yet
Clustering
7 pages
Cluster
No ratings yet
Cluster
20 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Machine Learning Note Modul 4 5[1]
No ratings yet
Machine Learning Note Modul 4 5[1]
20 pages
ML U5
No ratings yet
ML U5
24 pages
Data mining and machine learning
No ratings yet
Data mining and machine learning
48 pages
Ds Econtent
No ratings yet
Ds Econtent
8 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
Clustering
No ratings yet
Clustering
8 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
UNIT V MACHINE LEARNING
No ratings yet
UNIT V MACHINE LEARNING
5 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
Clustering
No ratings yet
Clustering
45 pages
Partition
No ratings yet
Partition
52 pages
UNIT 2 DMW
No ratings yet
UNIT 2 DMW
26 pages
Cluster Analysis-Unit 4
No ratings yet
Cluster Analysis-Unit 4
7 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
63 pages
DWMModule 4 (1) (1) (1)
No ratings yet
DWMModule 4 (1) (1) (1)
31 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
HTCB Unit 5
No ratings yet
HTCB Unit 5
3 pages
unit 2 ml
No ratings yet
unit 2 ml
11 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Clustering
No ratings yet
Clustering
21 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
93 pages
unit iv[1]
No ratings yet
unit iv[1]
96 pages
mod3 dm
No ratings yet
mod3 dm
20 pages
10ClusBasic (1)
No ratings yet
10ClusBasic (1)
31 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Amity School of Engineering and Technology Amity University, Uttar Pradesh
No ratings yet
Amity School of Engineering and Technology Amity University, Uttar Pradesh
5 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
A3
No ratings yet
A3
5 pages
DAAL_Assignment No 10
No ratings yet
DAAL_Assignment No 10
2 pages
DAAL_Assignment No 9
No ratings yet
DAAL_Assignment No 9
1 page
A5
No ratings yet
A5
4 pages
DAAL_Assignment No 7
No ratings yet
DAAL_Assignment No 7
2 pages
DAAL_Assignment No 8
No ratings yet
DAAL_Assignment No 8
1 page
S.Y Syllabus
No ratings yet
S.Y Syllabus
57 pages
Executive PG Programme in Data Science
No ratings yet
Executive PG Programme in Data Science
33 pages
Literature Survey
No ratings yet
Literature Survey
32 pages
Syl CS NLP Natural Language Processing xxx
No ratings yet
Syl CS NLP Natural Language Processing xxx
7 pages
Fundamentals of AI UPDATED
No ratings yet
Fundamentals of AI UPDATED
132 pages
8.0 Land Use Dynamics in BNP: Etr 109, Energy & Wetlands Research Group, Ces, Iisc
No ratings yet
8.0 Land Use Dynamics in BNP: Etr 109, Energy & Wetlands Research Group, Ces, Iisc
57 pages
Statistics in Psychology IGNOU Unit 4
No ratings yet
Statistics in Psychology IGNOU Unit 4
21 pages
Machine Learning KTU Module 1
No ratings yet
Machine Learning KTU Module 1
77 pages
Trading Based On Classification and Regression Trees
No ratings yet
Trading Based On Classification and Regression Trees
64 pages
2211940(Ebook) Lie Group Machine Learning by Fanzhang Li, Li Zhang, Zhao Zhang ISBN 9783110500684, 311050068X instant download
No ratings yet
2211940(Ebook) Lie Group Machine Learning by Fanzhang Li, Li Zhang, Zhao Zhang ISBN 9783110500684, 311050068X instant download
58 pages
A Survey of Ensemble Learning Concepts Algorithms Applications and Prospects
No ratings yet
A Survey of Ensemble Learning Concepts Algorithms Applications and Prospects
22 pages
Newzen - Python List - 2021
No ratings yet
Newzen - Python List - 2021
3 pages
YAO-SENIORTHESIS-2016
No ratings yet
YAO-SENIORTHESIS-2016
59 pages
Artificial Neural Networks Jntu Model Com
No ratings yet
Artificial Neural Networks Jntu Model Com
8 pages
Brainheaters Notes: SERIES 313-2018 (A.Y
No ratings yet
Brainheaters Notes: SERIES 313-2018 (A.Y
69 pages
Exploring_Flavors_Through_AI_The_Future_of_Culinary_Taste_Prediction
No ratings yet
Exploring_Flavors_Through_AI_The_Future_of_Culinary_Taste_Prediction
9 pages
Combining Classifiers: Outline
No ratings yet
Combining Classifiers: Outline
15 pages
Chapter 1: Introduction: Wavelet Based Brain Tumor Detection Using Haar Algorithm
No ratings yet
Chapter 1: Introduction: Wavelet Based Brain Tumor Detection Using Haar Algorithm
43 pages
1) Iris Flowers Classification ML Project - Learn About Supervised Machine Learning Algorithms
No ratings yet
1) Iris Flowers Classification ML Project - Learn About Supervised Machine Learning Algorithms
3 pages
Semi-Structured Documents Mining - A Review and Comparison
No ratings yet
Semi-Structured Documents Mining - A Review and Comparison
10 pages
Ieee 2011 Java Data Mining Projects SBGC
No ratings yet
Ieee 2011 Java Data Mining Projects SBGC
11 pages
M.L. 3,5,6 Unit 3
No ratings yet
M.L. 3,5,6 Unit 3
6 pages
An Approach To Detect Abusive Bangla Text
No ratings yet
An Approach To Detect Abusive Bangla Text
5 pages
Cse443 11904916
No ratings yet
Cse443 11904916
24 pages
Andrew Crudge, Will Thomas, Kaiyuan Zhu, Landmark Recognition Using Machine Learning
No ratings yet
Andrew Crudge, Will Thomas, Kaiyuan Zhu, Landmark Recognition Using Machine Learning
5 pages
Reed Cog Psych 1972
No ratings yet
Reed Cog Psych 1972
26 pages
Q#1: What Do Understand by Term Classification and Tabulation - Discuss Its Importance in Health Sciences?
No ratings yet
Q#1: What Do Understand by Term Classification and Tabulation - Discuss Its Importance in Health Sciences?
15 pages
What Is Supervised Machine Learning
No ratings yet
What Is Supervised Machine Learning
3 pages
DMDW_DAY-WISE_LESSON_PLAN[1]
No ratings yet
DMDW_DAY-WISE_LESSON_PLAN[1]
4 pages

Unit IV Unsupervised Learning

Uploaded by

Unit IV Unsupervised Learning

Uploaded by

Unit IV Unsupervised Learning

1. What do you mean by Unsupervised Learning?

Common unsupervised learning techniques include:

 Clustering (e.g., K-means, hierarchical clustering)

 Dimensionality Reduction (e.g., PCA, t-SNE)

 Association Rule Learning (e.g., Apriori, Eclat)

2. What do you mean by Cluster Analysis?

3. What are the Partition Methods (K-Means, K-Medoids)?

Agglomerative Hierarchical Clustering:

 Pros: Simple to implement, no need to specify the number of clusters in advance.

 Cons: Can be computationally expensive for large datasets (O(n²) complexity).

Divisive Hierarchical Clustering:

5. Dynamic Clustering, Multi-view Clustering

 Cons: Computationally intensive in real-time environments, especially when large data

6. Measuring Clustering Quality.

Internal Evaluation Metrics:

External Evaluation Metrics:

 Method: External metrics compare clustering results to predefined, external classifications

o Normalized Mutual Information (NMI): Measures the amount of information shared

You might also like