0% found this document useful (0 votes)

24 views14 pages

ML 4

Dimensionality reduction techniques are used to reduce the number of input features in a dataset while preserving relevant information, improving model performance, enabling visualization, and mitigating the curse of dimensionality. Key methods include feature selection and feature extraction, with algorithms like PCA, LDA, t-SNE, and UMAP being commonly used. Applications span various fields such as genomics, image compression, and noise reduction, although care must be taken to avoid information loss.

Uploaded by

madhangosikonda23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views14 pages

ML 4

Uploaded by

madhangosikonda23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

3-2 YR-SEM, R22 Regulation, CSM

UNIT 4
Dimensionality Reduction
Dimensionality reduction refers to techniques used to reduce the number of input features or
dimensions in a dataset while preserving as much relevant information as possible. It is essential for
handling high-dimensional data, improving computational efficiency, reducing storage requirements,
and mitigating issues like the curse of dimensionality.

Why Dimensionality Reduction?

1. Improves Model Performance:
o Reduces overfitting by removing irrelevant or redundant features.
o Speeds up training and inference by simplifying the dataset.
2. Visualization:
o Enables visualization of high-dimensional data in 2D or 3D spaces.
3. Mitigates Curse of Dimensionality:
o High-dimensional spaces can lead to sparse data, making distances and patterns less
meaningful.
4. Removes Noise:
o Reduces the impact of irrelevant or noisy features.

Types of Dimensionality Reduction

1. Feature Selection
Selects a subset of relevant features from the original dataset.
 Methods:
o Filter Methods: Use statistical measures like correlation, chi-square, or mutual
information.
o Wrapper Methods: Evaluate subsets of features based on model performance (e.g.,
recursive feature elimination).
o Embedded Methods: Select features as part of the model training process (e.g.,
LASSO, decision trees).

2. Feature Extraction
Transforms the data into a lower-dimensional space by creating new features.
 Linear Methods:
o Principal Component Analysis (PCA):

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3-2 YR-SEM, R22 Regulation, CSM

 Projects data onto a new set of orthogonal axes (principal components) that
maximize variance.
 Works by eigenvalue decomposition of the covariance matrix.
 Suitable for linearly separable data.
o Linear Discriminant Analysis (LDA):
 Focuses on maximizing class separability in supervised learning.
 Finds linear combinations of features that best separate different classes.
 Non-Linear Methods:
o t-SNE (t-Distributed Stochastic Neighbor Embedding):
 Preserves local structure in data while embedding it into lower dimensions.
 Often used for visualization but not suitable for downstream tasks.
o UMAP (Uniform Manifold Approximation and Projection):
 Similar to t-SNE but faster and preserves both global and local structures.
o Kernel PCA:
 Extends PCA to capture non-linear structures using kernel functions.
o Isomap:
 Uses geodesic distances instead of Euclidean distances to capture manifold
structures.

Key Algorithms for Dimensionality Reduction

Algorithm Type Description

Identifies principal components that capture maximum

PCA Linear
variance.

LDA Linear Maximizes separability among classes (supervised).

Non- Visualizes high-dimensional data by preserving pairwise

t-SNE
Linear similarities in a lower-dimensional space.

Non-
UMAP Similar to t-SNE but faster and scalable.
Linear

Non- Neural networks designed to learn compressed

Autoencoders
Linear representations of data.

Factor Analysis Linear Models data variability as a function of latent variables.

Independent Component
Linear Separates data into statistically independent components.
Analysis (ICA)

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3-2 YR-SEM, R22 Regulation, CSM

Algorithm Type Description

Non- Preserves the geodesic distances in a lower-dimensional

Isomap
Linear manifold.

LLE (Locally Linear Non- Captures the local neighborhood relationships for
Embedding) Linear dimensionality reduction.

Steps in Dimensionality Reduction

1. Preprocessing:
o Standardize or normalize features to ensure consistent scaling.
o Handle missing data.
2. Choose Technique:
o Decide based on data characteristics (linear or non-linear relationships, need for
interpretability, etc.).
3. Apply Algorithm:
o Reduce dimensions using the chosen technique.
4. Validate Results:
o Check for information loss using metrics like explained variance, reconstruction error,
or downstream model performance.

Applications
1. Visualization:
o Explore and understand high-dimensional data in 2D or 3D.
2. Preprocessing:
o Simplify datasets before training machine learning models.
3. Noise Reduction:
o Remove redundant or irrelevant features to improve model performance.
4. Genomics:
o Identify patterns in genetic data with thousands of features.
5. Image Compression:
o Reduce pixel data while preserving the image's key characteristics.

Advantages
 Simplifies data, making it easier to analyze and visualize.

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3-2 YR-SEM, R22 Regulation, CSM

 Reduces computational cost and storage requirements.

 Improves model interpretability.
 Mitigates the risk of overfitting.

Disadvantages
 Risk of information loss if not applied carefully.
 Some techniques (e.g., t-SNE) are computationally intensive for large datasets.
 Feature extraction methods create new features, which might lose interpretability.
 Non-linear techniques like t-SNE and UMAP are mainly for visualization and not suitable for all
downstream tasks.

Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) is a supervised machine learning technique used for
dimensionality reduction and classification. It is commonly used when the data are labeled
and the goal is to separate different classes in the feature space. Here’s a breakdown of the
key components of LDA:
Key Concepts:
1. Purpose:
o LDA is primarily used for classification, but it can also be used for dimensionality
reduction when there are more features than necessary.
o The goal is to find a linear combination of features that best separates two or more
classes.
2. Dimensionality Reduction:
o LDA reduces the dimensionality of data while maintaining as much information as
possible to distinguish between the classes.
o Unlike Principal Component Analysis (PCA), which focuses on maximizing variance,
LDA aims to maximize the separation between multiple classes.
3. How it Works:
o LDA works by computing a projection of the data points onto a lower-dimensional
space. This projection maximizes the distance between the means of the classes
while minimizing the variance within each class.
o The main idea is to maximize the ratio of between-class variance to within-class
variance, which is the criterion for the best separation.
4. Steps Involved:
o Compute the mean vectors for each class.
o Compute the scatter matrices:
 Within-class scatter matrix (measures the spread of the points within each
class).

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3-2 YR-SEM, R22 Regulation, CSM

 Between-class scatter matrix (measures the separation between the class

means).
o Calculate the eigenvalues and eigenvectors of the matrix formed by the inverse of the
within-class scatter matrix multiplied by the between-class scatter matrix.
o Select the top eigenvectors (based on the largest eigenvalues) to form the
transformation matrix that will be used for the projection.
5. Assumptions:
o The data for each class should be normally distributed.
o The classes should have the same covariance (homoscedasticity).
o The classes are assumed to be linearly separable.
6. Application:
o Classification: Once the projection matrix is obtained, it can be used to classify new
data points by projecting them onto the lower-dimensional space and determining
which class they are closest to.
o Dimensionality Reduction: In problems where there are many features, LDA can
reduce the number of dimensions while maintaining the class-separability.

Principal Component Analysis

Principal Component Analysis (PCA) is an unsupervised machine learning technique primarily
used for dimensionality reduction and feature extraction. It transforms a dataset with
potentially correlated variables into a set of linearly uncorrelated variables called principal
components.
Key Concepts of PCA
1. Purpose:
 Reduce the dimensionality of a dataset while retaining as much of the variability in the data
as possible.
 Identify patterns and highlight strong relationships among features.
 Remove redundant or less informative features.
2. How PCA Works:
 PCA identifies the directions (principal components) along which the variance in the data is
maximized.
 The first principal component (PC1) captures the largest variance in the data. The second
principal component (PC2) captures the next largest variance, and so on, under the constraint
that each subsequent component is orthogonal to the previous ones.
3. Steps of PCA:
1. Standardize the Data:
o Center the data by subtracting the mean from each feature.
o Scale the data to have unit variance if features have different scales.

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3-2 YR-SEM, R22 Regulation, CSM

2. Compute the Covariance Matrix:

o Calculate the covariance matrix to understand how the variables in the dataset are
related.
3. Perform Eigen Decomposition:
o Compute the eigenvalues and eigenvectors of the covariance matrix.
o Eigenvalues represent the amount of variance captured by each principal component.
o Eigenvectors represent the directions (axes) of the principal components.
4. Select Principal Components:
o Rank the eigenvalues in descending order and select the top kkk components that
capture the majority of the variance (based on a cumulative variance threshold, e.g.,
95%).
5. Project the Data:
o Transform the original dataset by projecting it onto the selected eigenvectors to form
the new dataset in the reduced-dimensional space.
4. Mathematical Formulation:
 The eigenvectors corresponding to the largest eigenvalues form the principal components.
5. Visualization:
 In 2D or 3D data, the principal components can be visualized as axes along which the data's
spread (variance) is maximized.

Applications of PCA:
1. Data Preprocessing:
o Reduce dimensionality before applying machine learning algorithms to prevent
overfitting and reduce computation time.
2. Noise Reduction:
o Eliminate features that contribute minimally to the overall variance, which often
represent noise.
3. Visualization:
o Reduce high-dimensional data to 2D or 3D for easier visualization and interpretation.
4. Genomics:
o Analyze gene expression data or genetic variation data with thousands of features.
5. Image Compression:
o Reduce image dimensions while retaining important visual information.

PCA vs. LDA:

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3-2 YR-SEM, R22 Regulation, CSM

Aspect PCA LDA

Type Unsupervised Supervised

Goal Maximize variance Maximize class separability

Class Label Use Does not use labels Requires labelled data

Output Principal components Linear discriminants

Factor Analysis (FA) & Independent Component Analysis (ICA)

Factor Analysis (FA) and Independent Component Analysis (ICA) are two statistical
techniques used for analysing data by reducing dimensions and identifying latent structures.
However, they have distinct goals, methods, and assumptions.

1. Factor Analysis (FA)

Purpose:
 FA is a statistical method used to identify underlying latent variables (factors) that explain the
observed correlations among measured variables.
 It assumes that the observed variables are linear combinations of the latent factors plus some
noise.
Key Concepts:
 Latent Variables: Unobservable variables that are inferred from observed data.
 Explained Variance: FA seeks to explain as much variance as possible in the observed variables
using fewer factors.
 Noise: Each observed variable includes some error or random noise that FA accounts for.
Steps in FA:
1. Model the Data:
2. Estimate the Factor Loadings:
3. Extract Factors:
4. Rotate Factors (optional):
Applications:
 Psychology: To identify constructs like intelligence or personality traits.
 Market Research: To group consumer behaviors or preferences.
 Social Sciences: To find common patterns in survey responses.

2. Independent Component Analysis (ICA)

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3-2 YR-SEM, R22 Regulation, CSM

Purpose:
 ICA is a computational method used to separate a multivariate signal into additive
independent components.
 Unlike FA, which focuses on covariance, ICA focuses on statistical independence.
Key Concepts:
 Independence: The goal is to find components that are statistically independent (minimize
mutual information).
 Blind Source Separation: ICA is often used for separating mixed signals, such as in the "cocktail
party problem" (isolating individual voices from overlapping audio signals).
Steps in ICA:
1. Model the Data:
o Assume that the observed data (XXX) are linear mixtures of independent components
(SSS): X=ASX = ASX=AS where AAA is the mixing matrix, and SSS are the independent
sources.
2. Center and Whiten the Data:
o Preprocess the data by centering (mean subtraction) and whitening (decorrelation).
3. Estimate the Mixing Matrix:
o Use algorithms (e.g., FastICA, InfoMax) to estimate AAA and recover SSS such that the
statistical independence of SSS is maximized.
4. Recover Independent Components:
o Solve for SSS by inverting the mixing process: S=A−1XS = A^{-1}XS=A−1X
Applications:
 Signal Processing: Separate audio, EEG, or fMRI signals.
 Image Processing: Decompose image data into independent features.
 Finance: Identify independent drivers of financial market movements.

Comparison of FA and ICA

Independent Component
Aspect Factor Analysis (FA)
Analysis (ICA)

Identify latent factors Find independent

Goal
explaining correlations components in data

Does not assume Assumes components are

Independence
independence of factors statistically independent

Model Includes noise/error term Assumes no explicit noise

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3-2 YR-SEM, R22 Regulation, CSM

Independent Component
Aspect Factor Analysis (FA)
Analysis (ICA)

Focus Correlation/Variance Statistical independence

Psychology, social sciences, Signal processing,

Applications
market research neuroscience, finance

Computationally more
Computation Often simpler than ICA
demanding

Locally Linear Embedding – Isomap – Least Squares Optimization

Locally Linear Embedding (LLE), Isomap, and Least Squares Optimization are techniques
used in machine learning and data analysis, particularly for dimensionality reduction and
optimization tasks. Here’s an overview of each:
1. Locally Linear Embedding (LLE)
Purpose:
 LLE is a nonlinear dimensionality reduction technique that preserves the local structure of
the data in a lower-dimensional space.
Key Concepts:
 Local Neighbourhoods: For each data point, LLE assumes the local neighbourhood is
approximately linear.
 Linear Reconstruction: It tries to represent each data point as a weighted sum of its
neighbours.
 Global Mapping: The algorithm then finds a low-dimensional embedding that preserves
these local relationships.
Steps of LLE:
1. Find Neighbors:
2. Compute Reconstruction Weights:
3. Optimize Low-Dimensional Embedding:
Applications:
 High-dimensional data visualization.
 Manifold learning for complex, nonlinear datasets.
 Applications in image processing and bioinformatics.

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3-2 YR-SEM, R22 Regulation, CSM

2. Isomap
Purpose:
 Isomap extends classical multidimensional scaling (MDS) to nonlinear manifolds by
preserving geodesic distances instead of Euclidean distances.
Key Concepts:
 Geodesic Distance: The shortest path between two points along the manifold.
 Manifold Assumption: Assumes data lie on a low-dimensional manifold embedded in a
higher-dimensional space.
Steps of Isomap:
1. Construct Neighborhood Graph:
o Use kkk-nearest neighbors or a distance threshold to connect nearby points.
2. Compute Geodesic Distances:
o Approximate the geodesic distances between all pairs of points using shortest paths
in the graph (e.g., Dijkstra's or Floyd-Warshall algorithm).
3. Apply Classical MDS:
o Use multidimensional scaling to find a low-dimensional embedding that preserves
the geodesic distances.
Applications:
 Nonlinear dimensionality reduction for manifold learning.
 Image recognition, motion analysis, and gene expression studies.
Comparison with LLE:
 Isomap preserves global geometric relationships, while LLE focuses on preserving local
structures.
 Isomap uses geodesic distances, while LLE relies on linear reconstructions within
neighborhoods.

3. Least Squares Optimization

Purpose:
 Least squares optimization is a mathematical approach used to minimize the sum of squared
differences between observed data and a model's predictions.
Key Concepts:
 Residuals: Differences between observed values and predicted values.
 Objective Function: Minimize the sum of squared residuals:
Applications:
 Regression (linear and nonlinear).

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3-2 YR-SEM, R22 Regulation, CSM

 Curve fitting.
 Optimization problems in machine learning, such as finding reconstruction weights in LLE.
Variants:
 Linear Least Squares:
o Solves problems where f(x,θ)f(x, \theta)f(x,θ) is linear in θ\thetaθ.
o Solution involves solving a system of linear equations: θ=(XTX)−1XTy\theta = (X^T
X)^{-1} X^T yθ=(XTX)−1XTy
 Nonlinear Least Squares:
o Used when f(x,θ)f(x, \theta)f(x,θ) is nonlinear in θ\thetaθ.
o Requires iterative optimization algorithms like gradient descent or Levenberg-
Marquardt.

Summary of Techniques:

Technique Purpose Key Approach Focus

Nonlinear
Preserve local Local linear
LLE dimensionality
structure relationships
reduction

Nonlinear Preserve global

Global manifold
Isomap dimensionality geodesic
structure
reduction distances

General Minimize sum Model fitting and

Least
optimization and of squared parameter
Squares
regression errors estimation

Genetic algorithms
Genetic Algorithms are a class of optimization algorithms inspired by the process of natural
selection and genetics. They are widely used for solving complex optimization and search
problems where traditional methods may struggle. The core idea is to evolve a population of
candidate solutions over successive generations to find the best solution.

Key Concepts of Genetic Algorithms

1. Core Components:
 Population: A group of candidate solutions (individuals) to the problem.
 Chromosomes: Representation of a solution, often as a binary string or other encoding.
 Fitness Function: Evaluates how good a candidate solution is at solving the problem.
2. Evolution Process:

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3-2 YR-SEM, R22 Regulation, CSM

1. Initialization:
o Generate an initial population randomly or based on heuristics.
2. Selection:
o Select the fittest individuals from the population for reproduction. Common
methods include:
 Roulette Wheel Selection: Probabilistic selection based on fitness values.
 Tournament Selection: Randomly pick a subset and select the best.
 Rank Selection: Rank individuals by fitness and select based on rank.
3. Reproduction (Crossover):
o Combine the genetic material of parent solutions to produce offspring.
o Mimics biological reproduction to explore the search space.
4. Mutation:
o Introduce random changes to offspring to maintain diversity and explore new areas.
5. Replacement:
o Replace the least fit individuals with offspring to form the next generation.
6. Termination:
o Repeat the process until a stopping condition is met (e.g., maximum generations or
acceptable fitness).

Genetic Operators
1. Crossover (Recombination):
 Combines genetic material from two parents to create offspring.
 Types of crossover:
o Single-Point Crossover: Swap sections of chromosomes after a random split point.
o Multi-Point Crossover: Swap multiple sections using multiple split points.
o Uniform Crossover: Each gene is independently chosen from one of the parents.
2. Mutation:
 Introduces random changes to the offspring’s genes.
 Ensures diversity in the population and prevents premature convergence.
 Types of mutation:
o Bit Flip: Change a binary gene (0 to 1 or vice versa).
o Swap Mutation: Swap the positions of two genes.
o Gaussian Mutation: Add a random Gaussian noise to numerical genes.

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3-2 YR-SEM, R22 Regulation, CSM

3. Selection:
 Decides which individuals reproduce.
 Strategies:
o Elitism: Always retain a fraction of the best solutions.
o Diversity Preservation: Encourage diversity to avoid local optima.
Using Genetic Algorithms
1. Problem Encoding:
 Represent the problem as a chromosome. For example:
o Binary Encoding: Use binary strings for combinatorial problems.
o Real-Value Encoding: Use real numbers for continuous problems.
o Permutation Encoding: Use permutations for ordering problems like the Traveling
Salesman Problem (TSP).
2. Fitness Function Design:
 Define a function that quantifies the quality of a solution.
 Example for TSP: Fitness(X)=1Total Distance(X)\text{Fitness}(X) = \frac{1}{\text{Total
Distance}(X)}Fitness(X)=Total Distance(X)1
3. Algorithm Implementation:
1. Initialize the population.
2. Evaluate fitness of individuals.
3. Select parents based on fitness.
4. Apply Crossover and Mutation to generate offspring.
5. Replace old population with new generation.
6. Repeat until stopping criteria are met.

Applications of Genetic Algorithms

1. Optimization:
o Solve nonlinear, high-dimensional, or discrete optimization problems.
o Examples: Scheduling, resource allocation, design optimization.
2. Machine Learning:
o Feature selection, hyperparameter tuning, neural network design.
3. Engineering:
o Structural design, control systems, and robotics.
4. Biology and Medicine:

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3-2 YR-SEM, R22 Regulation, CSM

o Protein structure prediction, gene sequence alignment.

5. Games and Art:
o Strategy generation, procedural content generation.

Advantages of Genetic Algorithms

 Works well with complex and poorly understood search spaces.
 Does not require gradient information or continuity.
 Can escape local optima due to stochastic operations.
Disadvantages of Genetic Algorithms
 Computationally expensive, especially for large populations.
 Performance depends on parameter tuning (e.g., mutation rate, crossover rate).
 May converge slowly or prematurely.
Example:
To minimize the function f(x)=x2f(x) = x^2f(x)=x2, using a binary representation:
1. Encode xxx as a binary chromosome.
2. Define fitness as −f(x)-f(x)−f(x) (since lower values are better).
3. Apply selection, crossover, and mutation iteratively until x≈0x \approx 0x≈0.

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

Dimensionality Reduction
No ratings yet
Dimensionality Reduction
66 pages
UNIT-4 Machine Learning
No ratings yet
UNIT-4 Machine Learning
20 pages
ML Unit 4
No ratings yet
ML Unit 4
10 pages
Unit 3
No ratings yet
Unit 3
21 pages
ML-Unit IV-1
No ratings yet
ML-Unit IV-1
28 pages
Machine Learning Notes Anna University
No ratings yet
Machine Learning Notes Anna University
21 pages
Exp9 MLAI2
No ratings yet
Exp9 MLAI2
2 pages
PCA and LDA Assignment
No ratings yet
PCA and LDA Assignment
5 pages
Lecture 16 - 25.09.2024 - PCA, Unsupervised Learning-Clustring & Metrics
No ratings yet
Lecture 16 - 25.09.2024 - PCA, Unsupervised Learning-Clustring & Metrics
51 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
Updated Feature Enginering Notes
No ratings yet
Updated Feature Enginering Notes
47 pages
Unit 4
No ratings yet
Unit 4
33 pages
PCALDAICA
No ratings yet
PCALDAICA
28 pages
Unit - 4
No ratings yet
Unit - 4
76 pages
Dimensionality Reduction Final
No ratings yet
Dimensionality Reduction Final
5 pages
Deep Learning 3
No ratings yet
Deep Learning 3
12 pages
Week 8 Notes - DM
No ratings yet
Week 8 Notes - DM
26 pages
ML Unit 4 (R22)
No ratings yet
ML Unit 4 (R22)
34 pages
ML Chapter 4
No ratings yet
ML Chapter 4
38 pages
Lecture W12ab
No ratings yet
Lecture W12ab
60 pages
Day School 03
No ratings yet
Day School 03
32 pages
Feature Engineering
No ratings yet
Feature Engineering
51 pages
9 - Linear Discriminant Analysis
No ratings yet
9 - Linear Discriminant Analysis
19 pages
BVT Bed Re Ets: Vie I
No ratings yet
BVT Bed Re Ets: Vie I
228 pages
It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
Week 4
No ratings yet
Week 4
3 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
1-7 MMC Efficient and Robust Feature Extraction by Maximum Margin Criterion
No ratings yet
1-7 MMC Efficient and Robust Feature Extraction by Maximum Margin Criterion
9 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
19 pages
ML Unit 4 at VS
No ratings yet
ML Unit 4 at VS
33 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
7 pages
Unit-4 Dimensionality Reduction
No ratings yet
Unit-4 Dimensionality Reduction
17 pages
Principal Component Analysis (PCA) and Linear Discriminant Analysis For Image Recognition
No ratings yet
Principal Component Analysis (PCA) and Linear Discriminant Analysis For Image Recognition
17 pages
ISOMAP in ML
No ratings yet
ISOMAP in ML
12 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
7 pages
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 2 (Advanced Phython Packages)
20 pages
ML Unit 3
No ratings yet
ML Unit 3
29 pages
A Review of Various Linear and Non Linear Dimensionality Reduction Techniques
No ratings yet
A Review of Various Linear and Non Linear Dimensionality Reduction Techniques
7 pages
ML Unit 4
No ratings yet
ML Unit 4
20 pages
EDAB Module 5 Singular Value Decomposition (SVD)
No ratings yet
EDAB Module 5 Singular Value Decomposition (SVD)
58 pages
SEL-487B-1: Bus Differential and Breaker Failure Relay
100% (1)
SEL-487B-1: Bus Differential and Breaker Failure Relay
726 pages
Implementation of Dimensionality Reduction Techniques in Hospital Management
No ratings yet
Implementation of Dimensionality Reduction Techniques in Hospital Management
4 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
9 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Dimensionality Reduction Techniques You Should Know in 2021
No ratings yet
Dimensionality Reduction Techniques You Should Know in 2021
12 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
CO3 Session 14
No ratings yet
CO3 Session 14
15 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Dimensionality Reduction-PCA FA LDA
No ratings yet
Dimensionality Reduction-PCA FA LDA
12 pages
ML Mod 6
No ratings yet
ML Mod 6
5 pages
Polo Chaur Dimension Reduction
No ratings yet
Polo Chaur Dimension Reduction
59 pages
Unit 5 Notes New
No ratings yet
Unit 5 Notes New
6 pages
Personal Values - Mark Manson
No ratings yet
Personal Values - Mark Manson
52 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
7 pages
Different Algorithms For Face Recognition
No ratings yet
Different Algorithms For Face Recognition
3 pages
Top 10 Solar O&M KPIs To Track - Arbox Renewable Energy
No ratings yet
Top 10 Solar O&M KPIs To Track - Arbox Renewable Energy
4 pages
Authors Book
No ratings yet
Authors Book
274 pages
GS4 Ethics Notes by @CSEWhy
No ratings yet
GS4 Ethics Notes by @CSEWhy
26 pages
Full Download Electromagnetic Waves and Lasers Second Edition Kimura Wayne D PDF
100% (3)
Full Download Electromagnetic Waves and Lasers Second Edition Kimura Wayne D PDF
49 pages
Binomial Theorem: IIT JEE (Main) Examination
No ratings yet
Binomial Theorem: IIT JEE (Main) Examination
56 pages
Utilization of Low-Density Polyethylene (LDPE) Plastic in Production of Cement Brick
No ratings yet
Utilization of Low-Density Polyethylene (LDPE) Plastic in Production of Cement Brick
41 pages
11.metar and Taf
No ratings yet
11.metar and Taf
51 pages
Agnico Eagle 2023 Sustainability Performance Data - 25042024
No ratings yet
Agnico Eagle 2023 Sustainability Performance Data - 25042024
147 pages
ANP Technical Note 10 - Human Factors
No ratings yet
ANP Technical Note 10 - Human Factors
7 pages
Persuasive Essay Layout
100% (2)
Persuasive Essay Layout
3 pages
Creating Cybersecurity Knowledge Graphs From Malware After Action Reports
No ratings yet
Creating Cybersecurity Knowledge Graphs From Malware After Action Reports
13 pages
Humour&Gender in The Marvellous Mrs Maisel by Alina-Laura Chitu
No ratings yet
Humour&Gender in The Marvellous Mrs Maisel by Alina-Laura Chitu
110 pages
Devops Lab 2024
No ratings yet
Devops Lab 2024
17 pages
Entrepreneurship Development: Examination
No ratings yet
Entrepreneurship Development: Examination
15 pages
Best Ferrocement Structure 2016
No ratings yet
Best Ferrocement Structure 2016
7 pages
Sadiq PPT 2
No ratings yet
Sadiq PPT 2
11 pages
CE118 Project Part 1
No ratings yet
CE118 Project Part 1
42 pages
MN Abulane
No ratings yet
MN Abulane
11 pages
Department of Electrical Engineering: M.B.M Engineering College, Jodhpur
No ratings yet
Department of Electrical Engineering: M.B.M Engineering College, Jodhpur
16 pages
STS Reviewer
No ratings yet
STS Reviewer
23 pages
Geo3701 Unit 2
No ratings yet
Geo3701 Unit 2
59 pages
Book 1
No ratings yet
Book 1
3 pages
Lesson One - Inclusive Education - Supplimentary Notes
No ratings yet
Lesson One - Inclusive Education - Supplimentary Notes
10 pages
Section 1-Short Cantilever ST
No ratings yet
Section 1-Short Cantilever ST
5 pages
Business Etiquette in South Korea - 20230908 - 122053 - 0000
No ratings yet
Business Etiquette in South Korea - 20230908 - 122053 - 0000
8 pages
Safety Data Sheet: Section 1. Product and Company Identification
No ratings yet
Safety Data Sheet: Section 1. Product and Company Identification
10 pages
Proposition
No ratings yet
Proposition
6 pages
INAC 2011 Phnatom Alderson RANDO - Boia Et Al
No ratings yet
INAC 2011 Phnatom Alderson RANDO - Boia Et Al
10 pages
137-E Blank Form
No ratings yet
137-E Blank Form
3 pages
CT Series
No ratings yet
CT Series
6 pages
Morality and The Good Life
No ratings yet
Morality and The Good Life
6 pages
Quarter 1 Least Learned Competencies in Science
No ratings yet
Quarter 1 Least Learned Competencies in Science
3 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet

ML 4

Uploaded by

ML 4

Uploaded by

3-2 YR-SEM, R22 Regulation, CSM

Why Dimensionality Reduction?

Types of Dimensionality Reduction

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

Key Algorithms for Dimensionality Reduction

Algorithm Type Description

Identifies principal components that capture maximum

LDA Linear Maximizes separability among classes (supervised).

Non- Visualizes high-dimensional data by preserving pairwise

Non- Neural networks designed to learn compressed

Factor Analysis Linear Models data variability as a function of latent variables.

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

Algorithm Type Description

Non- Preserves the geodesic distances in a lower-dimensional

Steps in Dimensionality Reduction

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

 Reduces computational cost and storage requirements.

Linear Discriminant Analysis

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

 Between-class scatter matrix (measures the separation between the class

Principal Component Analysis

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

2. Compute the Covariance Matrix:

PCA vs. LDA:

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

Aspect PCA LDA

Type Unsupervised Supervised

Goal Maximize variance Maximize class separability

Output Principal components Linear discriminants

Factor Analysis (FA) & Independent Component Analysis (ICA)

1. Factor Analysis (FA)

2. Independent Component Analysis (ICA)

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

Comparison of FA and ICA

Identify latent factors Find independent

Does not assume Assumes components are

Model Includes noise/error term Assumes no explicit noise

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

Focus Correlation/Variance Statistical independence

Psychology, social sciences, Signal processing,

Locally Linear Embedding – Isomap – Least Squares Optimization

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

3. Least Squares Optimization

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

Technique Purpose Key Approach Focus

Nonlinear Preserve global

General Minimize sum Model fitting and

Key Concepts of Genetic Algorithms

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

Applications of Genetic Algorithms

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

o Protein structure prediction, gene sequence alignment.

Advantages of Genetic Algorithms

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

You might also like