0% found this document useful (0 votes)

13 views15 pages

Clustering and Dimensionality Reduction Techniques PCA T SNE K Means

Uploaded by

akter12345b

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views15 pages

Clustering and Dimensionality Reduction Techniques PCA T SNE K Means

Uploaded by

akter12345b

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Clustering and dimensionality reduction techniques

(PCA, t-SNE, K-means)

Course: Machine Learning in Healthcare

Orientador: Md. Maruf Hossain
Session: 2022-23

Department of Biomedical Engineering

Islamic University

October 30, 2024

Hossain, M. M. (BME) Tı́tle Page October 30, 2024 1 / 15
Outline

1. Introduction to clustering and dimensionality reduction

2. Types of Clustering

3. Dimentional Reduction

4. Types of Dimensional Reduction

5. Principal Component Analysis

6. How does PCA work?

7. Steps of PCA

8. Advantage and Disadvantage of PCA

Hossain, M. M. (BME) Tı́tle Page October 30, 2024 2 / 15

Introduction to clustering and dimensionality reduction

In data science and machine learning, clustering and dimensionality reduction are
techniques used to simplify, organize, and interpret complex datasets.
Both approaches aim to make data analysis easier by revealing patterns and
structures within data, often used in exploratory data analysis and preprocessing for
machine learning.
What is Clustering?
Clustering is an unsupervised learning technique that groups data points into clusters
based on similarity.
It’s particularly useful for finding patterns or structures in unlabeled data.
In clustering, Clusters are groups where data points are more similar to each other
than to points in other groups.
Applications include image segmentation, and anomaly detection.
e.g. K-means clustering
Hossain, M. M. (BME) Tı́tle Page October 30, 2024 3 / 15
Types of Clustering

Types of Clustering
K-means clustering
K-means is a simple and fast clustering algorithm that partitions data into K clusters.
Here’s how it works:
Initialization: Choose K initial cluster centroids (center points) randomly.
Assignment: Assign each data point to the nearest centroid based on distance (often
Euclidean). Update: Recalculate the centroids of the clusters by finding the mean of
points within each cluster. Repeat: Repeat the assignment and update steps until the
centroids no longer change significantly, or a maximum number of iterations is reached.
Advantages: Simple and computationally efficient.
Works well with large datasets.
Limitations:
Sensitive to the initial choice of centroids.
Assumes clusters are spherical and equally sized, which may not be true for complex data
structures.
Hossain, M. M. (BME) Tı́tle Page October 30, 2024 4 / 15
Dimentional Reduction

Introduction to clustering and dimensionality reduction

Dimensionality Reduction
Dimensionality reduction techniques reduce the number of features (dimensions) in a
dataset while preserving as much information as possible.

Dimensionality reduction can help with:

-Reducing computational cost by lowering the number of features.
-Visualizing high-dimensional data in 2D or 3D.
-Reducing noise and improving model performance.

Hossain, M. M. (BME) Tı́tle Page

Figure 1 – Dimensionality Reduction. October 30, 2024 5 / 15
Types of Dimensional Reduction

Dimensionality Reduction
There are two popular dimensionality reduction techniques.
1. Principal Component Analysis (PCA), and
2. t-Distributed Stochastic Neighbor Embedding (t-SNE).
t-Distributed Stochastic Neighbor Embedding (t-SNE)
t-SNE is a non-linear dimensionality reduction technique that’s especially useful for
visualizing high-dimensional data in 2D or 3D.
Calculate Pairwise Similarities: Calculate probabilities of similarity between points
in high-dimensional space.
Visualization: The result is a 2D or 3D scatter plot showing clusters or patterns in
the data.
Advantages: Great for visualizing complex, non-linear relationships in data. Captures
local and global structure.
Limitations: Computationally intensive and Not ideal for general-purpose dimensionality
reduction beyond
Hossain, M. M. (BME)visualization. Tı́tle Page October 30, 2024 6 / 15
Principal Component Analysis

Dimensionality Reduction
Principal Component Analysis
Principal component analysis (PCA) is a dimensionality reduction and machine
learning method used to simplify a large data set into a smaller set.
It is a method that is often used to reduce the dimensionality of large data sets, by
transforming a large set of variables into a smaller one that still contains most of the
information in the large set.
What Are Principal Components? Principal components are new variables that are
constructed as linear combinations or mixtures of the initial variables.

Figure 2 – Principle Component.

Hossain, M. M. (BME) Tı́tle Page October 30, 2024 7 / 15
Principal Component Analysis

Principal Components Analysis

How PCA Constructs the Principal Components
As there are as many principal components as there are variables in the data, principal
components are constructed in such a manner that the first principal component
accounts for the largest possible variance in the data set.

Figure 3 – Caption.
Hossain, M. M. (BME) Tı́tle Page October 30, 2024 8 / 15
How does PCA work?

How does PCA work?

Start with Dataset(m observations, n features)

Calculate Covariance Matrix

Find Eigenvectors and Eigenvalues

Identify Principal Components(Directions of Maximum Variance)

Sort by Eigenvalues and SelectSignificant Principal Components

Hossain, M. M. (BME) Tı́tle Page October 30, 2024 9 / 15
Steps of PCA

Steps of PCA

1. Standardization: Scale the features to have a mean of 0 and a standard deviation of

1 (optional but recommended).
2. Compute Covariance Matrix: This captures how the features vary together.
3. Compute Eigenvectors and Eigenvalues: Eigenvectors represent the directions of
greatest variance, and eigenvalues tell you how much variance each eigenvector explains.
4. Select Principal Components: Choose the top k eigenvectors based on their
corresponding eigenvalues to form the new feature subspace.
5. Projection: Project the original data onto the new feature subspace formed by the
selected eigenvectors. Or project the original data points onto the new principal
component axes.

Hossain, M. M. (BME) Tı́tle Page October 30, 2024 10 / 15

Steps of PCA

Steps of PCA
Step 1: Standardization
First, we need to standardize our dataset to ensure that each variable has a mean of 0
and a standard deviation of 1.
Z = X µ = σX µ
Here,
• µisthemeanofindependentfeaturesµ = µ1, µ2, , µmµ = µ1, µ2, , µm
• σisthestandarddeviationofindependentfeaturesσ = σ1, σ2, , σσσ = σ1, σ2, ,

Step2: Covariance Matrix Computation

Covariance measures the strength of joint variability between two or more variables,
indicating how much they change in relation to each other. To find the covariance we
can use the formula:
Pn
(x −x̄ )(x −x̄ )
cov(x1 , x2 ) = i=1 1i n−11 2i 2
Hossain, M. M. (BME) Tı́tle Page October 30, 2024 11 / 15
Steps of PCA

Steps of PCA
The value of covariance can be positive, negative, or zeros.
• Positive: As the x1 increases x2 also increases.
• Negative: As the x1 increases x2 also decreases.
• Zeros: No direct relation
Step 3 & 4- Compute Eigenvectors and Eigenvalues of Covariance Matrix to Identify
Principal Components:
• Let A be a square matrix, νavector , andλascalarthatsatisfies
Aν = λν
then λ is called the eigenvalue associated with eigenvector ν of A.

Step 5- Projection or Transform the samples onto the new subspace:

• In the last step, we use the dimensional matrix that we just computed to transform our
samples onto the new subspace via the equation y = W × x where W is the transpose of
the matrix W.
Hossain, M. M. (BME) Tı́tle Page October 30, 2024 12 / 15
Advantage and Disadvantage of PCA

Advantage and Disadvantage of PCA

Advantages of PCA
1. Dimensionality Reduction: Reduces dataset complexity by lowering the number of
variables, improving analysis and performance.
2. Feature Selection: Helps in identifying the most important features, enhancing
machine learning model efficiency.
3. Data Visualization: Projects high-dimensional data into 2D or 3D for easier
interpretation.
4. Multicollinearity: Addresses issues of correlated features by creating uncorrelated
variables, useful for regression.
5. Noise Reduction: Improves data quality by removing components with low variance,
reducing noise.
6. Data Compression: Lowers storage needs by representing data with fewer principal
components.
7. Outlier Detection: Identifies outliers as points that deviate significantly in PCA.
Hossain, M. M. (BME) Tı́tle Page October 30, 2024 13 / 15
Advantage and Disadvantage of PCA

Advantage and Disadvantage of PCA

Disadvantages of PCA
1. Interpretation: Principal components are linear combinations and can be hard to
interpret.
2. Data Scaling: Sensitive to data scaling; requires proper normalization for accurate
results.
3. Information Loss: May lead to information loss if too few components are retained.
4. Non-linear Relationships: Assumes linearity, limiting its effectiveness for non-linear
data.
5. Computational Complexity: Computationally expensive for large datasets with
many variables.
6. Overfitting: Risk of overfitting if too many components are retained or if applied to
small datasets.

Hossain, M. M. (BME) Tı́tle Page October 30, 2024 14 / 15

Advantage and Disadvantage of PCA

Implementation of PCA in ML

import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
Sample dataset X = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0]])
Step 1: Standardize the data
scaler = StandardScaler()
Xs caled = scaler .fitt ransform(X )
Step 2: Apply PCA
pca = PCA(nc omponents = 2)
Xp ca = pca.fitt ransform(Xs caled)
Print results
print(”Explained variance ratio:”, pca.explainedv ariancer atio)
print(”Principal components:”, pca.components )

Hossain, M. M. (BME) Tı́tle Page October 30, 2024 15 / 15

20 Effective ChatGPT Prompts
89% (9)
20 Effective ChatGPT Prompts
10 pages
Healthcare Virtual Assistants
No ratings yet
Healthcare Virtual Assistants
14 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
Pca 1
No ratings yet
Pca 1
3 pages
PCA Dev
No ratings yet
PCA Dev
16 pages
Dimensionality Reduction Technique
No ratings yet
Dimensionality Reduction Technique
17 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Unit 3
No ratings yet
Unit 3
102 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Principal Component Analysis PCA in Machine Learning
No ratings yet
Principal Component Analysis PCA in Machine Learning
20 pages
Principal Component Analysis (PCA) in Machine Learning
No ratings yet
Principal Component Analysis (PCA) in Machine Learning
20 pages
Advanced Data Analysis Techniques 2
No ratings yet
Advanced Data Analysis Techniques 2
32 pages
PCA Theory
No ratings yet
PCA Theory
13 pages
Module 3
No ratings yet
Module 3
41 pages
ML (Unit 5)
No ratings yet
ML (Unit 5)
34 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Pca&kmean
No ratings yet
Pca&kmean
6 pages
Pca - Principal Component Analysis 1233
No ratings yet
Pca - Principal Component Analysis 1233
30 pages
What Is PCA?: Image Source
No ratings yet
What Is PCA?: Image Source
17 pages
Ai (PCA)
No ratings yet
Ai (PCA)
3 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
Love Report
No ratings yet
Love Report
7 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
28 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
1694601214-Unit 3.4 Principal Component Analysis CU 2.0
No ratings yet
1694601214-Unit 3.4 Principal Component Analysis CU 2.0
36 pages
CS464 Ch6 FeatureExtraction
No ratings yet
CS464 Ch6 FeatureExtraction
46 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
PCA
100% (1)
PCA
33 pages
Module3_OTML (1)
No ratings yet
Module3_OTML (1)
67 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Dimensionality Reduction: Motivation I: Data Compression
No ratings yet
Dimensionality Reduction: Motivation I: Data Compression
35 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
No ratings yet
Dimensionality Reduction Techniques in Data Mining Aim To Reduce The Number of Features
9 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Module 2-PCA-1
No ratings yet
Module 2-PCA-1
26 pages
Pca
No ratings yet
Pca
18 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Module 2 Lab 2
No ratings yet
Module 2 Lab 2
5 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Pca Lda Lobo
No ratings yet
Pca Lda Lobo
20 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Program 3
No ratings yet
Program 3
7 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
The Intuition Behind PCA: Machine Learning Assignment
No ratings yet
The Intuition Behind PCA: Machine Learning Assignment
11 pages
Dimensionality Reduction (Principal Component Analysis)
No ratings yet
Dimensionality Reduction (Principal Component Analysis)
12 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Principal Computer Analysis (PCA)
No ratings yet
Principal Computer Analysis (PCA)
25 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
SVM Mahfuz
No ratings yet
SVM Mahfuz
27 pages
Decision Tree by Masud
No ratings yet
Decision Tree by Masud
12 pages
Gradient Decent
No ratings yet
Gradient Decent
15 pages
Federated Learning
No ratings yet
Federated Learning
21 pages
Lda 1
No ratings yet
Lda 1
12 pages
Biosensor
No ratings yet
Biosensor
28 pages
Chapter 3
No ratings yet
Chapter 3
13 pages
Logistic Regression by Nirzona
No ratings yet
Logistic Regression by Nirzona
11 pages
Person Who Does Not Believe in The Existence of God-: A. Theist B. Heretic C. Atheist D. Fanatic
No ratings yet
Person Who Does Not Believe in The Existence of God-: A. Theist B. Heretic C. Atheist D. Fanatic
22 pages
The Study of Classical Physiology and of Many Medical Specialties Is Structured
No ratings yet
The Study of Classical Physiology and of Many Medical Specialties Is Structured
11 pages
T1 Basis of Laboratory Safety
No ratings yet
T1 Basis of Laboratory Safety
54 pages
SLIDE
No ratings yet
SLIDE
23 pages
ML Midsem 2022
No ratings yet
ML Midsem 2022
8 pages
Artificial Intelligence and Prospects of Economic Growth
No ratings yet
Artificial Intelligence and Prospects of Economic Growth
14 pages
Parking Management System Final
No ratings yet
Parking Management System Final
14 pages
Unsupervised Classification Report Word
No ratings yet
Unsupervised Classification Report Word
3 pages
Management Information System: Expert Systems
No ratings yet
Management Information System: Expert Systems
19 pages
Sign Lang Detection Project
No ratings yet
Sign Lang Detection Project
16 pages
Creator's Field Guide To Emerging Careers in Interactive 3D
No ratings yet
Creator's Field Guide To Emerging Careers in Interactive 3D
50 pages
Machine Learning Unit 4
100% (1)
Machine Learning Unit 4
78 pages
FoodBiotech - AI in Spoilage Dectection
No ratings yet
FoodBiotech - AI in Spoilage Dectection
1 page
Chapter 10
No ratings yet
Chapter 10
6 pages
Vectors
No ratings yet
Vectors
50 pages
Data Science & Engineering: M.Tech
No ratings yet
Data Science & Engineering: M.Tech
20 pages
Empowering Local Image Generation: Harnessing Stable Diffusion For Machine Learning and AI
No ratings yet
Empowering Local Image Generation: Harnessing Stable Diffusion For Machine Learning and AI
3 pages
AI and Business Ethics
No ratings yet
AI and Business Ethics
8 pages
An Application of A Deep Learning Algorithm For Automatic Detection of Unexpected Accidents Under Bad CCTV Monitoring Conditions in Tunnels
No ratings yet
An Application of A Deep Learning Algorithm For Automatic Detection of Unexpected Accidents Under Bad CCTV Monitoring Conditions in Tunnels
5 pages
Chapter Two Intelligent Agents: 1 by Berahnu A. (MSC.)
No ratings yet
Chapter Two Intelligent Agents: 1 by Berahnu A. (MSC.)
30 pages
UPSC IAS Live GS Foundation P2I English New Year Batch - 31 March 2024 - QRT - Model Answer - English
No ratings yet
UPSC IAS Live GS Foundation P2I English New Year Batch - 31 March 2024 - QRT - Model Answer - English
11 pages
Age Prediction and Performance Comparison by Adaptive Network Based Fuzzy Inference System Using Subtractive Clustering
No ratings yet
Age Prediction and Performance Comparison by Adaptive Network Based Fuzzy Inference System Using Subtractive Clustering
5 pages
Prototype Based Deepm Learning Paper 2 Zhou
No ratings yet
Prototype Based Deepm Learning Paper 2 Zhou
12 pages
ZG536 L1 Introduction 140124
No ratings yet
ZG536 L1 Introduction 140124
18 pages
Homework For Grown Ups Daily Mail
100% (1)
Homework For Grown Ups Daily Mail
7 pages
All in One CheatSheet PDF
No ratings yet
All in One CheatSheet PDF
52 pages
Finite Markov Decision Processes-BR
No ratings yet
Finite Markov Decision Processes-BR
31 pages
CLass - XI PAI Syllabus
No ratings yet
CLass - XI PAI Syllabus
3 pages
Opn Research by Prof Narang
No ratings yet
Opn Research by Prof Narang
43 pages
Project Report
No ratings yet
Project Report
5 pages
Artificial Intelligence Dave2.0
No ratings yet
Artificial Intelligence Dave2.0
11 pages
Section A-Research Paper Personal Healthcare Chatbot For Medical Suggestions Using Artificial Intelligence and Machine Learning Eur
No ratings yet
Section A-Research Paper Personal Healthcare Chatbot For Medical Suggestions Using Artificial Intelligence and Machine Learning Eur
10 pages

Clustering and Dimensionality Reduction Techniques PCA T SNE K Means

Uploaded by

Clustering and Dimensionality Reduction Techniques PCA T SNE K Means

Uploaded by

Clustering and dimensionality reduction techniques

(PCA, t-SNE, K-means)

Course: Machine Learning in Healthcare

Department of Biomedical Engineering

October 30, 2024

1. Introduction to clustering and dimensionality reduction

4. Types of Dimensional Reduction

5. Principal Component Analysis

6. How does PCA work?

8. Advantage and Disadvantage of PCA

Hossain, M. M. (BME) Tı́tle Page October 30, 2024 2 / 15

Introduction to clustering and dimensionality reduction

Introduction to clustering and dimensionality reduction

Dimensionality reduction can help with:

Hossain, M. M. (BME) Tı́tle Page

Figure 2 – Principle Component.

Principal Components Analysis

How does PCA work?

Start with Dataset(m observations, n features)

Calculate Covariance Matrix

Find Eigenvectors and Eigenvalues

Identify Principal Components(Directions of Maximum Variance)

Sort by Eigenvalues and SelectSignificant Principal Components

1. Standardization: Scale the features to have a mean of 0 and a standard deviation of

Hossain, M. M. (BME) Tı́tle Page October 30, 2024 10 / 15

Step2: Covariance Matrix Computation

Step 5- Projection or Transform the samples onto the new subspace:

Advantage and Disadvantage of PCA

Advantage and Disadvantage of PCA

Hossain, M. M. (BME) Tı́tle Page October 30, 2024 14 / 15

Hossain, M. M. (BME) Tı́tle Page October 30, 2024 15 / 15

You might also like